Closed peterjc closed 5 years ago
Looks like I was wrong, do have new entries including:
And, there has been a change in how hybrids are listed:
$ grep 324745 new_taxdump_2019-01-01/names.dmp
324745 | Phytophthora medicaginis x cryptogea | | scientific name |
1324745 | Vibrio sp. EF1C-CB167 | | scientific name |
2324745 | Ceratopogonidae sp. BBDCN479-10 | | scientific name |
$ grep 324745 new_taxdump_2019-09-01/names.dmp
324745 | Phytophthora medicaginis x Phytophthora cryptogea | | scientific name |
1324745 | Vibrio sp. EF1C-CB167 | | scientific name |
2324745 | Ceratopogonidae sp. BBDCN479-10 | | scientific name |
The old style worked better with our parser and loading, using genus Phytophthora and species medicaginis x cryptogea, will probably have to refactor the load-tax
code to handle this.
Or, stop splitting the text into genus+species (removing the genus from the species field), and leave the genus in the species field (usually redundant)?
Also need to think about how this matches the names used in NCBI FASTA files on import.
We're not currently importing the NCBI format files at species level, but still need to look at handling of hybrids with either naming style...
Closed via #179, but will want to review this again at some point...
Just checked, and the latest NCBI taxonomy dump (September 2019) is essentially unchanged from Jan 2019 for the oomycetes:
Used https://github.com/abaizan/kodoja/blob/master/test/taxonomy/filter_taxonomy.py to generate the filtered
names.dmp
files.However, will want to periodically review this - is it something worth including the continuous integration tests (with a monthly cron job say), to flag when there is a relevant change in the taxonomy?
i.e. Instead of fetching a fixed version, could build against the latest taxonomy, and check the output from the import commands?