plazi / ggxml2taxpub

Conversion of GoldenGATE XML to JATS/TaxPub at treatment level
0 stars 1 forks source link

remove line breaks in taxpub output #29

Open myrmoteras opened 2 years ago

myrmoteras commented 2 years ago

create for each material-citation a single line https://github.com/plazi/ggxml2taxpub-treatments/blob/main/level1/0384433845052F6CFE68FD53FDA7FD8D_tp_l1.xml

image

tcatapano commented 2 years ago

as discussed with @gsautter whitespace cleanup will happen post xslt conversion by the export service to be set up on the SRS server.

myrmoteras commented 2 years ago

@tcatapano @gsautter can we please make this change and provide a new set of taxpub files for @jgobeill - he is waiting for it (see last tech meeting https://docs.google.com/document/d/1mEACrbcjfGBaaHEB5qeZ9tESFBdsol98RHkUKxIiT-Y/edit#heading=h.61ni7e1dljbu)

tcatapano commented 2 years ago

@myrmoteras @jgobeill: I've applied XML "pretty print" to the level1 files. As mentioned above, eventually we will implement a similar pretty printing post-process to the files provided by the GG to TaxPub service.

Note that the whitespace in XML is generally not significant and should not be a factor for xml aware downstream processing, but tools do exist to perform such "pretty printing." In oXygen, one can use "Format and Indent" to pretty print on individual or multiple XML files. I believe also that XML libraries have similar features pretty printing. E.g. lxml in Python. Hope this helps.

gsautter commented 2 years ago

@myrmoteras @tcatapano built a pretty printing output writer now to go between the XSL Transformer output and the client-bound output ... currently integrating it in the code of the web front-end servlets.

gsautter commented 2 years ago

Pretty printing is deployed now, see https://tb.plazi.org/GgServer/taxPubL1/0384433845052F6CFE68FD53FDA7FD8D (and any other treatment).