Open pamichel opened 3 years ago
@pamichel : many of these are technically valid, although not optimally encoded (e.g., xref containing xref). There are some more significant errors, such as nested treatments, although they occurs infrequently in the sample. Nevertheless, these problems introduce some noise and friction into the task at hand so should be addressed. I'd recommend two approaches:
Hello Terry,
I run a similar program to analyze the structure of the 251 XML files generated in folder intermediate. I am still getting some nested tags: dupl xref 4 dupl tp:treatment-sec 43 dupl tp:taxon-treatment 44 dupl named-content 238
Just an example for you to understand my log file, the line
_path 202 1 ['named-content'] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content ['C24787C5BF59FFFCFF74F3CD6847633Atp.xml']
says that we find 202 times this xpath (/tmp:taxon-treatment/.../named-content) with contains nested 'named-content' elements, for instance in 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'
I don't know if these remaining nested elements are expected or not. Cheers, Pierre-André
processing 20 / 253 processing 40 / 253 processing 60 / 253 processing 80 / 253 processing 100 / 253 processing 120 / 253 processing 140 / 253 processing 160 / 253 processing 180 / 253 processing 200 / 253 processing 220 / 253 processing 240 / 253 processed 251 / 253 ------ list of distinct XML paths found: count, duplicated nested element(s), sample file(s) ------ path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:material-citation ['153987C4E22DB657D9D254001679FC47_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:material-citation/named-content ['153987C4E22DB657D9D254001679FC47_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/sec/tp:treatment-sec ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/sec/p ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/sec/tp:treatment-sec/p/xref ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-name ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/sec/tp:treatment-sec/p ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/sec/tp:treatment-sec/p/tp:taxon-name ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/sec ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/sec/title ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 1 1 ['named-content'] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content/named-content/named-content ['03BD87E66B1EFF44AF89FEACFD103F2D_tp.xml'] path 1 0 [] /tp:taxon-treatment/p/table/tr/td/xref ['03CA87A4D360803079234572FAAC616E_tp.xml'] path 1 0 [] /tp:taxon-treatment/sec ['03DBDB11C177FFBDFF54FB48FA0FB2D5_tp.xml'] path 1 0 [] /tp:taxon-treatment/sec/tp:treatment-sec ['03DBDB11C177FFBDFF54FB48FA0FB2D5_tp.xml'] path 1 0 [] /tp:taxon-treatment/sec/tp:treatment-sec/p/tp:taxon-name ['03DBDB11C177FFBDFF54FB48FA0FB2D5_tp.xml'] path 1 0 [] /tp:taxon-treatment/sec/tp:treatment-sec/p/xref ['03DBDB11C177FFBDFF54FB48FA0FB2D5_tp.xml'] path 1 0 [] /tp:taxon-treatment/sec/tp:treatment-sec/p ['03DBDB11C177FFBDFF54FB48FA0FB2D5_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table/tp:taxon-name ['BC53327AFFB16C2A25CFF91BD226C58C_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/tp:material-citation/named-content ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/xref ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 2 ['named-content', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/tp:material-citation/named-content/named-content ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/tp:material-citation ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/fig/caption/p ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/fig/graphic ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/fig ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/fig/caption ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 2 ['tp:treatment-sec', 'xref'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/xref/xref ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation/tp:nomenclature-citation-list ['03A6D248FFBF8B65DEC1F9FAFD78D1AA_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation/tp:nomenclature-citation-list/tp:taxon-name ['03A6D248FFBF8B65DEC1F9FAFD78D1AA_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/sup ['5B678789FFC11458A09D0CE9FB5FFAB4_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/p ['153987C4E23CB65DD9D250F41712FEDB_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/p/tp:taxon-name ['153987C4E23CB65DD9D250F41712FEDB_tp.xml'] path 1 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table/tr/td/xref ['039CAA64605AFFD1FF633A7CD796FC96_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation/tp:taxon-name ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/fig/caption ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/fig ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/fig/caption/p/tp:taxon-name ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/fig/graphic ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 3 ['named-content', 'tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content/named-content ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation/xref ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/fig/caption/p ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 1 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation ['A0103577FF8DFFB8FC09D2F27EA806E7_tp.xml'] path 2 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation/xref ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p/tp:taxon-name ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:nomenclature/tp:taxon-name/object-id ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/ref-list ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:nomenclature ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 3 ['named-content', 'tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-meta ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:nomenclature/tp:taxon-name ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:nomenclature/xref ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/ref-list/ref/mixed-citation ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/ref-list/ref ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/tp:taxon-name ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:treatment-sec/p ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 1 ['tp:taxon-treatment'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-meta/mixed-citation ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/xref ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 2 ['tp:taxon-treatment', 'tp:treatment-sec'] /tp:taxon-treatment/tp:treatment-sec/tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation ['D03F87E4D11F505BFF42DAAC4DA5AD18_tp.xml'] path 2 0 [] /tp:taxon-treatment/tp:treatment-sec/xref ['03A0721BFFB3FF94FF42BC8409CAFD60_tp.xml'] path 3 1 ['xref'] /tp:taxon-treatment/tp:treatment-sec/p/xref/xref ['153987C4E22DB657D9D254001679FC47_tp.xml'] path 3 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation/tp:taxon-name ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 4 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] path 4 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/caption/p/sup ['8545EB45EC18A008FF3F8A1FFD5A8555_tp.xml'] path 8 0 [] /tp:taxon-treatment/p/table/tr/td/tp:taxon-name ['335587B2C4761C2BFF28FC88FDFEFD14_tp.xml'] path 9 0 [] /tp:taxon-treatment/tp:treatment-sec/p/bold ['03B787E5BA48C124FDD11C76763203AB_tp.xml'] path 10 0 [] /tp:taxon-treatment/p ['335587B2C4761C2BFF28FC88FDFEFD14_tp.xml'] path 10 0 [] /tp:taxon-treatment/p/table ['335587B2C4761C2BFF28FC88FDFEFD14_tp.xml'] path 10 0 [] /tp:taxon-treatment/p/table/tr/td ['335587B2C4761C2BFF28FC88FDFEFD14_tp.xml'] path 10 0 [] /tp:taxon-treatment/p/table/tr ['335587B2C4761C2BFF28FC88FDFEFD14_tp.xml'] path 11 0 [] /tp:taxon-treatment/fig/caption/p/xref ['03D77A7DFFDBF70DFD92D8F5B5DDFB77_tp.xml'] path 16 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table/tr/td/tp:taxon-name ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 18 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/caption/p/xref ['03D77A7DFFDBF70DFD92D8F5B5DDFB77_tp.xml'] path 24 0 [] /tp:taxon-treatment/tp:treatment-sec/title ['03B787E5BA48C124FDD11C76763203AB_tp.xml'] path 30 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table/tr ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 30 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 30 0 [] /tp:taxon-treatment/tp:treatment-sec/p/table/tr/td ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 31 1 ['named-content'] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content/named-content ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 43 0 [] /tp:taxon-treatment/tp:treatment-sec/p/sup ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] path 60 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/xref ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] path 84 0 [] /tp:taxon-treatment/fig/caption/p/tp:taxon-name ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 87 0 [] /tp:taxon-treatment/fig/caption/p ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 87 0 [] /tp:taxon-treatment/fig/graphic ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 87 0 [] /tp:taxon-treatment/fig ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 87 0 [] /tp:taxon-treatment/fig/caption ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 115 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/caption/p/tp:taxon-name ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 117 0 [] /tp:taxon-treatment/tp:treatment-sec/fig ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 117 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/graphic ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 117 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/caption/p ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 117 0 [] /tp:taxon-treatment/tp:treatment-sec/fig/caption ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] path 118 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation/tp:taxon-name ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 121 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation/xref ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] path 164 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:taxon-name ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 200 0 [] /tp:taxon-treatment/tp:treatment-sec/p/tp:taxon-name ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 202 1 ['named-content'] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content/named-content ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] path 225 0 [] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation/named-content ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] path 225 0 [] /tp:taxon-treatment/tp:treatment-sec/p/tp:material-citation ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] path 231 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list/tp:nomenclature-citation ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 247 0 [] /tp:taxon-treatment/tp:treatment-sec/tp:nomenclature-citation-list ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 250 0 [] /tp:taxon-treatment/tp:treatment-meta ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 250 0 [] /tp:taxon-treatment/ref-list ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 250 0 [] /tp:taxon-treatment/ref-list/ref/mixed-citation ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 250 0 [] /tp:taxon-treatment/ref-list/ref ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 250 0 [] /tp:taxon-treatment/tp:treatment-meta/mixed-citation ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:treatment-sec/p ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:treatment-sec ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:nomenclature ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:nomenclature/xref ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:nomenclature/tp:taxon-name/object-id ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:treatment-sec/p/xref ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 1 [''] /{http://www.plazi.org/taxpub}taxon-treatment ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] path 251 0 [] /tp:taxon-treatment/tp:nomenclature/tp:taxon-name ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] ------ number of distinct XML paths ------ path set :117 ------ list of duplicacted nested XML elements: occurence count ------ dupl xref 4 dupl tp:treatment-sec 43 dupl tp:taxon-treatment 44 dupl named-content 238 dupl 251 ------ number of duplicated nested XML elements dup dict : 5 ------ list of XML elements found: count with sample files ------ elem 9 bold ['03B787E5BA48C124FDD11C76763203AB_tp.xml', '03C787B8FFA7FFE3B55B56E6FD4BD7BE_tp.xml', '03C787B8FFBAFFEFB6F55694FB4BD172_tp.xml'] elem 12 sec ['03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml', '03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml', '03BC87C5FFD63974A363FBBDFC72FDF1_tp.xml'] elem 25 title ['03B787E5BA48C124FDD11C76763203AB_tp.xml', '03A6D248FFEF8B2ADD59FE65FBC4D4D9_tp.xml', '03A6D248FFD18B05DEDFFD70FB78D369_tp.xml'] elem 48 sup ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '335587B2C4011C56FF28FEDAFEFFFC7C_tp.xml', '03B787E5BA48C124FDD11C76763203AB_tp.xml'] elem 66 td ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '361087A7FFC9FFB255ABF90750ABCDEC_tp.xml'] elem 106 tr ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] elem 147 table ['3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml', '3660577CFFA1FF8EE5B2952BFE6AF847_tp.xml'] elem 206 graphic ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '335587B2C4371C6FFF28FB1EFA78F87C_tp.xml'] elem 251 {http: ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 251 www.plazi.org ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 251 taxpub}taxon-treatment ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 253 object-id ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 484 tp:nomenclature-citation ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 504 tp:treatment-meta ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 504 ref ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 504 mixed-citation ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 645 caption ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 696 tp:material-citation ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 734 xref ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 739 named-content ['C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml', 'C24787C5BF59FFFCFF74F3CD6847633A_tp.xml'] elem 756 ref-list ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 952 tp:nomenclature-citation-list ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 1012 tp:nomenclature ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 1057 fig ['03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml', '03007E45FF82FF803D89FCF9F2F9B998_tp.xml'] elem 1226 tp:taxon-name ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 2068 p ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 3502 tp:treatment-sec ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] elem 6250 tp:taxon-treatment ['03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml', '03E587FAFFABFFAEFF2A5B052E7F7733_tp.xml'] ------ list of XML elements found with their attributes and occurence count ------ elem /tp:taxon-treatment attribs: {} elem /tp:treatment-meta attribs: {} elem /mixed-citation attribs: {} elem /tp:nomenclature attribs: {} elem /tp:taxon-name attribs: {} elem /object-id attribs: {} elem /xref attribs: {'rid': 5122} elem /tp:treatment-sec attribs: {'sec-type': 1606} elem /tp:nomenclature-citation-list attribs: {} elem /p attribs: {} elem /tp:nomenclature-citation attribs: {} elem /ref-list attribs: {} elem /ref attribs: {'id': 2284} elem /sup attribs: {} elem /tp:material-citation attribs: {} elem /named-content attribs: {'content-type': 24237} elem /fig attribs: {'id': 481, 'fig-type': 481, 'position': 481} elem /caption attribs: {} elem /graphic attribs: {'xlink:href': 481} elem /table attribs: {'id': 69} elem /tr attribs: {} elem /td attribs: {} elem /title attribs: {} elem /bold attribs: {} elem /sec attribs: {} ------ Number of distinct XML elements ------ elem set :28