plazi / ggxml2taxpub

Conversion of GoldenGATE XML to JATS/TaxPub at treatment level
0 stars 1 forks source link

run tp conversion on phytochemistry articles #72

Open tcatapano opened 7 months ago

tcatapano commented 7 months ago

starting with sample file Phytochemistry.103.67-75.pdf.xml

tcatapano commented 7 months ago

Problem with TP result:

Two consecutive treatments in the source are being merged into a single treatment, with the two nomenclature sections placed one after another at the start of the treatment: see

https://github.com/plazi/ggxml2taxpub/blob/8f605eab9dd119dea1f287c67768d1aa6dde6b45/level1/articles/non-tax/Phytochemistry.103.67-75.pdf_tp.xml#L1854C2-L1861C2

        <tp:mixed-nomenclature> 2.1. Floral scent composition of <tp:taxon-name>Nymphaea subg.
               Hydrocallis</tp:taxon-name>
         </tp:mixed-nomenclature>
         <tp:mixed-nomenclature> 2.2. Floral scent variations within
               <tp:taxon-name>Nymphaea</tp:taxon-name> and <tp:taxon-name>Victoria</tp:taxon-name>
         </tp:mixed-nomenclature>
         <tp:treatment-sec sec-type="description">
            <p> The six species and two subspecies of <tp:taxon-name> Nymphaea subg. Hydrocallis
tcatapano commented 7 months ago

Think the problem is here: (https://github.com/plazi/ggxml2taxpub/blob/8f605eab9dd119dea1f287c67768d1aa6dde6b45/xslt/gg2tp_l1.xsl#L8-L13)

this should be

xsl:apply-templates select=".//subSubSection[@type = 'nomenclature']"/> 
         <xsl:apply-templates select=".//subSubSection[not(@type = 'nomenclature')]"/> 

to iterate over descendant subSubSection's of the current treatment, not in the document as a whole

tcatapano commented 7 months ago

The xpath fix worked. Resulting file is now valid. Now run over larger sample set.

tcatapano commented 7 months ago

Conversion on full batch results in most files being valid. Errors are in:

  22 Phytochemistry.157.168-174.pdf_tp.xml
   3 Phytochemistry.189.112824.pdf_tp.xml
   2 Phytochemistry.187.112776.pdf_tp.xml
   2 Phytochemistry.186.112741.pdf_tp.xml
   1 Phytochemistry.193.112970.pdf_tp.xml
   1 Phytochemistry.191.112908.pdf_tp.xml
   1 Phytochemistry.163.196-197.pdf_tp.xml
   1 Phytochemistry.157.158-167.pdf_tp.xml
   1 Phytochemistry.153.58-63.pdf_tp.xml

see: https://github.com/plazi/ggxml2taxpub/blob/master/errs/phytochemistry_errors_20240211_frq.txt:

https://github.com/plazi/ggxml2taxpub/blob/7c4116159486b97d2f0329eaf2c0edb514451bd5/errs/phytochemistry_errors_20240211_frq.txt#L1-L7