Closed jhpoelen closed 1 year ago
As far I understand, taxonIDs are meant to identify specific taxa.
And, in the tpt mammal host taxonomic is appears that a few taxonIDs are reused across many taxon names.
when generating a frequency table for distinct taxonID values, you'd expect taxonIDs to appear only once.
However, when running
curl --silent -L https://raw.githubusercontent.com/njdowdy/tpt-taxonomy/main/host_files/Mammalia-standardized-v2.csv \ | mlr --csv cut -f taxonID\ | sort\ | uniq -c\ | sort -nr
the results (shown below) indicate that 220 is used over 6k times, and 180, 140, 100 are also used more than once.
220
180
140
100
6369 220 1328 180 170 140 27 100 1 taxonID
@njdowdy @EMTuckerLab curious to hear your ideas on the taxonID assignment of the TPT mammal host taxonomy.
I've created a patch https://github.com/njdowdy/tpt-taxonomy/issues/18 . Please review and accept if you agree with changes.
resolved via https://github.com/njdowdy/tpt-taxonomy/pull/20
As far I understand, taxonIDs are meant to identify specific taxa.
And, in the tpt mammal host taxonomic is appears that a few taxonIDs are reused across many taxon names.
when generating a frequency table for distinct taxonID values, you'd expect taxonIDs to appear only once.
However, when running
the results (shown below) indicate that
220
is used over 6k times, and180
,140
,100
are also used more than once.@njdowdy @EMTuckerLab curious to hear your ideas on the taxonID assignment of the TPT mammal host taxonomy.