Open tcatapano opened 7 months ago
Problem with TP result:
Two consecutive treatments in the source are being merged into a single treatment, with the two nomenclature sections placed one after another at the start of the treatment: see
<tp:mixed-nomenclature> 2.1. Floral scent composition of <tp:taxon-name>Nymphaea subg.
Hydrocallis</tp:taxon-name>
</tp:mixed-nomenclature>
<tp:mixed-nomenclature> 2.2. Floral scent variations within
<tp:taxon-name>Nymphaea</tp:taxon-name> and <tp:taxon-name>Victoria</tp:taxon-name>
</tp:mixed-nomenclature>
<tp:treatment-sec sec-type="description">
<p> The six species and two subspecies of <tp:taxon-name> Nymphaea subg. Hydrocallis
Think the problem is here: (https://github.com/plazi/ggxml2taxpub/blob/8f605eab9dd119dea1f287c67768d1aa6dde6b45/xslt/gg2tp_l1.xsl#L8-L13)
this should be
xsl:apply-templates select=".//subSubSection[@type = 'nomenclature']"/>
<xsl:apply-templates select=".//subSubSection[not(@type = 'nomenclature')]"/>
to iterate over descendant subSubSection's of the current treatment, not in the document as a whole
The xpath fix worked. Resulting file is now valid. Now run over larger sample set.
Conversion on full batch results in most files being valid. Errors are in:
22 Phytochemistry.157.168-174.pdf_tp.xml
3 Phytochemistry.189.112824.pdf_tp.xml
2 Phytochemistry.187.112776.pdf_tp.xml
2 Phytochemistry.186.112741.pdf_tp.xml
1 Phytochemistry.193.112970.pdf_tp.xml
1 Phytochemistry.191.112908.pdf_tp.xml
1 Phytochemistry.163.196-197.pdf_tp.xml
1 Phytochemistry.157.158-167.pdf_tp.xml
1 Phytochemistry.153.58-63.pdf_tp.xml
see: https://github.com/plazi/ggxml2taxpub/blob/master/errs/phytochemistry_errors_20240211_frq.txt:
starting with sample file
Phytochemistry.103.67-75.pdf.xml