pkp / pkp-docs

A repository for generating PKP's documentation hub.
15 stars 247 forks source link

Add XML Production guide #492

Open NateWr opened 4 years ago

NateWr commented 4 years ago

This issue is a place to discuss the development of a new XML Production guide. This guide is planned to cover how to set up the tools, what to expect from them, how to get the best results, and what training editorial staff will need.

NateWr commented 4 years ago

Here is a work-in-progress pull-request. I've prepared an initial outline of the XML production guide, written an intro, and ported over the user-facing documentation that @Vitaliy-1 has written for the DocxConverter.

This is a very early draft and open to a completely different structure/tone/audience.

To the DIG: does the outline and introduction look ok? how does it correspond to your expectations? did you have a different audience/intend in mind?

To XML team and interested parties: although we focus on one particular workflow, I've left a space to recommend other conversion tools. Which of these tools should we recommend?

@Vitaliy-1: I've only pulled over the documentation that I think is best directed at editorial staff. I think the technical side of the documentation, where you talk about cell merging, ooxml, etc, can be left in the plugin's readme since it's more technical. Does my draft look ok? Please add/edit as you see appropriate.

Lists: which numbered types are supported by docxconverter? Is it just 1/2/3 or do we support i/ii/ii and a/b/c? Any additional considerations to take into account when formatting lists for the converter?

Tables: I noticed that it says cells may appear with no paragraphs. But this was different from the specification we recommended to Libero. Should we change that with docxconverter?

Bibliography style and Google Docs: there's no way to do bibiliography style in Google Docs, right? That's what I've said.

@withanage: I've not yet drafted up the stuff on using Texture. Is this the best place to pull information from?

Also, what does the Texture plugin do when saving the citations back from Texture? Does it write them to the citations in OJS's database?

NateWr commented 4 years ago

Also, what does the Texture plugin do when saving the citations back from Texture? Does it write them to the citations in OJS's database?

Dulip said that so far it's only saved to the XML, not rendered back to OJS's citations table.

withanage commented 4 years ago

@NateWr

@withanage: I've not yet drafted up the stuff on using Texture. Is this the best place to pull information from?

yes, alongside I would take the handbook,

https://github.com/pkp/texture#handbook

I have written (although some screenshots , may be old ) , cause there I added the other functionalities which I created like Galley creation, DAR import and export. And also the very basic HTML ZIP import function is documented there.

NateWr commented 4 years ago

The handbook looks amazing! Is this in a Google Doc somewhere? Maybe someone from the DIG could take it and convert into a resource for the docs hub?

Vitaliy-1 commented 4 years ago

Lists: which numbered types are supported by docxconverter? Is it just 1/2/3 or do we support i/ii/ii and a/b/c? Any additional considerations to take into account when formatting lists for the converter?

They are ordered and unordered only. I can extract some additional styling info from DOCX but it may not be consistent across text editors. I think the best would be to wait for JATS Editor release and then decide list styles mapping.

Tables: I noticed that it says cells may appear with no paragraphs. But this was different from the specification we recommended to Libero. Should we change that with docxconverter?

Paragraphs are removed at the post-conversion stage, on the plugin level, after JATS XML is outputted by the parsing library: https://github.com/Vitaliy-1/docxConverter/blob/master/classes/DOCXConverterDocument.inc.php#L130 That was done for compatibility reasons with the Texture. I always found this inconvenient as paragraphs are allowed by the standards. Once Libero Editor is in place, I have planned to remove this workaround. So, my mistake when writing docs - paragraphs, for now, are always removed.

Bibliography style and Google Docs: there's no way to do bibiliography style in Google Docs, right? That's what I've said.

To be more precise, Google Docs doesn't have bibliography style for paragraphs, although recently Google added the ability to include citations into the document, which anyway aren't exported in any structured way to DOCX.

NateWr commented 4 years ago

Note from XML discussion: when using Texture, don't try to use the references field in OJS. You'll need to use JATSParser, eLens or equivalent to publish from the XML directly.

marcbria commented 4 years ago

First thanks for all for the effort. I feel like is something the community was asking for and will enjoy a lot.

About this guide, I think we also need and introduction to contextualize/summarize what you will find in the rest of the document So I will suggest adding an introduction to make a zoom-in approach (is this right in English), from a bird's eye view to the detailed documentation.

Even in a brief way, I think this introduction need to cover 3 main goals: 1) Explain the benefits and future potentials of an XML workflow for the different user's roles. 2) Explain the workflow itself (in ojs context), the tools involved/recommended and how those pieces fit together. 3) Suggest a "safe-path" or, till we don't have any, make it clear what tools are solid and will be compatible with any future work or experimental (and susceptible to changes).

About 1, we have plenty of documents that list those benefits, so just need to be adapted and summarized. About 2, we will need to describe the workflow stages and tools. I'm biased but I think a variation of this diagram could help for a visual representation. About 3, even nothing is never for sure, I think is important to make a recommendation, because nobody knows better than you 4 (you two, Vitaliy and James) what are the risks and the opportunities here.

What do you think?

NateWr commented 4 years ago

Thanks @marcbria. I think I've tried to 1-3 in the intro. Can you recommend some specific changes or identify places where the intro doesn't provide the right details?

withanage commented 4 years ago

@NateWr

Sure, I have attached the source word document and all the images, for the DIG team to have a look. Texture_Handbook.docx assets.zip

marcbria commented 4 years ago

Sorry Nate. I completely missed your intro. :-(

Reviewing it right now and as you said, 1 and 3 are well covered, but I will encourage extending it with a better workflow explanation (as asked in 2).

About the workflow, you talk about 3 stages (that is good for simplicity), but thinking it guide will grow in future, so I will go with 4 (or even 5), as follows:

  1. SUBMISSION: authors send their work (usually in DOCx or ODT) to the platform.
  2. CONVERSION: the originals are transformed to JATS.
  3. EDITING: conversions are not perfect or authors asked us to make modifications to our final galleys,... so we need something to make changes directly to the JATS files.
  4. PRESENTATION: JATS is not for humans. It need to be converted to something that humans can read (HTML, PDF, EPUB...) or they can be shown with JS helpers

First, it's a nuance, but I think naming each stage help in thinking the process in a more clear manner.

Second, I split your first stage in SUBMISSION and CONVERSION, because I think them as separate actions, that will be performed by different kind of tooling in separate workflow moments. About SUBMISSION, in future we can have specific submission tools (like fidus plugin, submit from googleDocs url, DAR-desktop exportation or any third-party integration tool...) that will let us "ingest" documents that could be converted or not (or even converted to other formats). This is different that CONVERSION, that is a category itself. I mean, we are now very focused in JATS, but it's not hard to imagine that in future we will like to convert to other formats (TEI, schema...) or even now that we only think in JATS, we have different alternatives.

I'm not as confident about adding a 5th stage called "Distribution/Harvesting" to cover OAI and other existing or future spreading technologies/protocols or we can re-think stage 4 (Presentation) in a wider perspective... even I'm inclined to think them as different actions/moments, so again, different stages.

About the need of a recommendation in your intro you say "This guide will mention these alternatives but can not provide a recommendation."

I disagree and I think we really need to make recommendation here... with the proper disclaimer and all the warnings explain that "nobody really knows what will happen" an so on, but users really need it to know if they are on tools that PKP will support in future (what I call "safe path") and how complete are each tool to take their decisions.

Let me explain why I think this is really important: During last years I get plenty of community questions asking about "what tools should they use". In past I told them that they can go with OTS and ojs3-markup (what I'm calling "safe-path") but it won't be my recommendation right now. Then I found "Texture" was a really promising tool and I suggest going with it, but we know now is a dead end.

I mean, reality is changing, shit happens and so on, but community will appreciate a lot (and will feel more secure to adopt technologies) with a recommendation from the PKP dev team, instead the opinion of somebody-else's.

And let me be crystal clear here: Not asking about a contract or a promise wrote in blood... ;-)

Right it could be something like (please, take the idea and not the my phrasing):

  1. SUBMISSION:
    • Covered by OJS3 itself: Able to ingest every document type. Solid as a rock. PKP recommended "safe-path").
  2. CONVERSION:
    • Vitaliy's docxConverter: Convert (body and citations) from docx to JATS. Beta but very promising. PKP recommended "safe-path").
  3. EDITING:
    • Dulip's texture integration Web based or desktop for the edition of body and citations. Beta but will be abandoned. PKP recommended till we have better choices).
  4. PRESENTATION:
    • Vitaliy's JATSParser: Translates JATS to HTML and PDF. Metadata taken from OJS (not from JATS). Alpha version, but good enough for HTML and basic PDF. PKP recommended "safe-path"
    • Dulip's Lens plugin: Uses eLife's javascript to show JATS/BITS in a browser. Not responsive. Beta of an abandoned project. Not recommended by PKP.
  5. HARVESTING:
    • Alec's OAI-JATS: Expose JATS XML via the OAI-PMH interface. Alpha version

Change the term "safe-path" if you like and we can use a "semaphore colors", or "defcon 1, 2, 3), or animals (snake, dog, elephant...) or whatever you prefer to explain in a simple way what is solid or quicksand.

If you feel this "recommendation" will blur the introduction, for clarity sake, it can be add as final chapter... even I still like the idea of a visual representation/syntesis to show the workflow in the intro (like I did in the who-is-who-in-jats reports).

Finally, don't know where, but I think is also important to explain that PKP is focused in implementing JATS4R and not any other favor of the "standard".

Sorry a lot for the extension. Didn't know how to explain in a simplier manner.

NateWr commented 3 years ago

Thanks @marcbria that's really helpful detail. Some comments:

I split your first stage in SUBMISSION and CONVERSION

I'm hesitant to add a step that we can not offer any assistance with right now. We don't offer any OJS integration that performs automated conversions or reading of metadata from a particular file type. Also, in the future, the distinction between submission and conversion is likely to break down. Eventually, we want conversion to happen at the time of submission for what we're calling a "doc-centric workflow".

That said, I'd like to hear from the DIG on what they imagined for this document. If the idea of breaking out into more stages, even if the guide says "just use OJS", matches their expectations of what an XML Production workflow is, then it's worth adding. My concern is to not overcomplicate it for people who are new to the topic.

I disagree and I think we really need to make recommendation here...

I think that this recommendation will come through in the document once it's ready. There aren't a bunch of tools that will be discussed. The document describes only those tools we recommend, and will carefully describe the limitations of each of these tools (including the limited shelf-life of Texture/Lens).

I still like the idea of a visual representation/syntesis to show the workflow in the intro

I like this too. I'm worried about how this can be translated and kept up-to-date, but it would be nice to have.

Alec's OAI-JATS

I actually did not know about this plugin! :facepalm: I'll talk with Alec about it's condition. Personally, I think that this falls into the Publish and Distribute section, but I'm open to separating these if the chapter gets too long.

think is also important to explain that PKP is focused in implementing JATS4R and not any other favor of the "standard".

Good idea!

asmecher commented 3 years ago

@NateWr, re: OAIJats and JATS Template, see this document on Coalition Publica's XML process and setup. https://docs.pkp.sfu.ca/coalition-publica/

marcbria commented 3 years ago

As a summary, everything in blue in this diagram are from PKP developments (or from close partners). In bold, the recommended path back in 2019... (just before discovering grobid and froze the OTS approach).

imagen

If I didn't miss anything, right now, the recommendation would be: docxConverter (B6) > Texture (C2) > JatsParser (D4) > OAI JATS (D2) | [JatsTemplate (D6)]

Thinking in the "safe-path", I'm wondering if OTS approach is definitively abandoned and superseded by Vitaly's and Dulip's work.

As far as somebody interested in JATS will find all this developments (in forum or google), I think the guide need to mention them and explain if they are active or not.

marcbria commented 3 years ago

About the visual representation/synthesis to show the workflow: I like this too. I'm worried about how this can be translated and kept up-to-date, but it would be nice to have.

Main effort is creating it. Once is done, it will be really easy to maintain (ie: svg, edited with inkscape or a diagram done with markdown and compiled with marmaid, or just generate it over hackMD...)

If you ask, my preference would be inkscape because you have more control over the design, and Inkscape is free software so everybody could install it and play with the svg. If we go with it, I don't mind to take the responsible of updating the diagrams each time is required (in English and spanish).

asmecher commented 3 years ago

I'm wondering if OTS approach is definitively abandoned and superseded by Vitaly's and Dulip's work.

Yes, that's correct.

NateWr commented 3 years ago

Recommendation from Amanda: split the Production Workflow section off from the intro and put it in its own section (right after the into).

Vitaliy-1 commented 3 years ago

A great description of the XML Workflow within OJS, found on the web: https://www.ed.ac.uk/files/atoms/files/xml_publishing_in_ojs_-_project_summary_user_guide_0.pdf