peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
https://thapbi-pict.readthedocs.io/
MIT License
8 stars 2 forks source link

Duplicate synonym warnings (Pythium/Globisporangium basionyms) #285

Open peterjc opened 3 years ago

peterjc commented 3 years ago

Are these false positives or misinterpretations stemming from my misunderstanding of exactly how the NCBI use the names.dmp file? Examples triggered by 2021-01-01 taxdump:

WARNING: Synonym Pythium macrosporum duplicated?
WARNING: Synonym Pythium okanoganense duplicated?
WARNING: Synonym Pythium rostratum duplicated?

We have recorded Pythium macrosporum NCBI:txid181666 as a species and as an alias of Globisporangium macrosporum NCBI:txid2711120 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=181666 Pythium macrosporum https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2711120 Globisporangium macrosporum

$ grep macrosporum database/ITS1_DB.sql
INSERT INTO taxonomy VALUES(618,181666,'Pythium','macrosporum');
INSERT INTO taxonomy VALUES(630,1077224,'Pythium','aff. macrosporum');
INSERT INTO taxonomy VALUES(1103,2711120,'Globisporangium','macrosporum');
INSERT INTO synonym VALUES(722,1103,'Pythium macrosporum');

We have treated Pythium okanoganense NCBI:txid182683 as a species and an alias of Globisporangium okanoganense NCBI:txid2711121 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=182683 Pythium okanoganense https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2711121 Globisporangium okanoganense

$ grep okanoganense database/ITS1_DB.sql
INSERT INTO taxonomy VALUES(617,182683,'Pythium','okanoganense');
INSERT INTO taxonomy VALUES(1104,2711121,'Globisporangium','okanoganense');
INSERT INTO synonym VALUES(723,1104,'Pythium okanoganense');

We have treated Pythium rostratum NCBI:txid82947 as a species and an alias of Globisporangium rostratum taxid2711123 https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=82947 Pythium rostratum https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2711123 Globisporangium rostratum

$ grep rostratum database/ITS1_DB.sql
INSERT INTO taxonomy VALUES(508,82947,'Pythium','rostratum');
INSERT INTO taxonomy VALUES(573,2603267,'Pythium','aff. rostratum');
INSERT INTO taxonomy VALUES(1106,2711123,'Globisporangium','rostratum');

The NCBI web-interface reports these aliases as basionyms, where the Globisporangium species was based on the older Pythium description:

Globisporangium rostratum (E.J. Butler) Uzuhashi, Tojo & Kakish., 2010 basionym: Pythium rostratum E.J. Butler, 1907

Globisporangium okanoganense (P.E. Lipps) Uzuhashi, Tojo & Kakish., 2010 in [Uzuhashi S et al. (2010)] basionym: Pythium okanoganense P.E. Lipps, 1981

Globisporangium rostratum (E.J. Butler) Uzuhashi, Tojo & Kakish., 2010 basionym: Pythium rostratum E.J. Butler, 1907

Superficially it looks like I should be ignoring the older Pythium entries, but they remain live in the NCBI taxonomy at some level.

ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/taxdump_readme.txt

names.dmp
---------
Taxonomy names file has these fields:

    tax_id                  -- the id of node associated with this name
    name_txt                -- name itself
    unique name             -- the unique variant of this name if name not unique
    name class              -- (synonym, common name, ...)

e.g.

$ grep "rostratum>" names.dmp
82947   |   Pythium rostratum   |   Pythium rostratum <Pythium rostratum>   |   scientific name |
2711123 |   Pythium rostratum   |   Pythium rostratum <Globisporangium rostratum>   |   synonym |

Seems I may need to consider a non-empty unique name field (column 3 of names.dmp), but in essence need to decide if want Pythium rostratum to be treated as a species or a synonym of Globisporangium rostratum here.

peterjc commented 3 years ago

Note - this is currently not a priority as we have no sequences from these three (6?) species anyway.

peterjc commented 3 years ago

Uzuhashi, Tojo & Kakish (2010) Phylogeny of the genus Pythium and description of new genera https://doi.org/10.1007/S10267-010-0046-7

Phylogeny of the genus Pythium is analyzed based on sequences of the large subunit ribosomal DNA D1/D2 region and cytochrome oxidase II gene region of Pythium isolates and comprehensive species of related taxa belonging to the Oomycetes. The phylogenetic trees show that the genus Pythium is a highly divergent group and divided into five well- or moderately supported monophyletic clades. Each clade is characterized by sporangial morphology such as globose, ovoid, elongated, or filamentous shapes. Based on phylogeny and morphology, the genus Pythium (s. str.) is emended, and four new genera, Ovatisporangium, Globisporangium, Elongisporangium, and Pilasporangium, are described and segregated from Pythium s. lato.

Does seem like we should treat the Pythium names as aliases of the newer Globisporangium species.

peterjc commented 3 years ago

Looking at one example:

$ grep -E "^(82947|2711123)\t" names.dmp 
82947   |   Pythium rostratum   |   Pythium rostratum <Pythium rostratum>   |   scientific name |
2711123 |   Globisporangium rostratum (E.J. Butler) Uzuhashi, Tojo & Kakish., 2010  |       |   authority   |
2711123 |   Globisporangium rostratum   |       |   scientific name |
2711123 |   Pythium rostratum E.J. Butler, 1907 |       |   authority   |
2711123 |   Pythium rostratum   |   Pythium rostratum <Globisporangium rostratum>   |   synonym |

i.e.

82947   | Pythium rostratum                | Pythium rostratum <Pythium rostratum>         | scientific name |
2711123 | Globisporangium rostratum (E...  |                                               | authority       |
2711123 | Globisporangium rostratum        |                                               | scientific name |
2711123 | Pythium rostratum E.J. Butler... |                                               | authority       |
2711123 | Pythium rostratum                | Pythium rostratum <Globisporangium rostratum> | synonym         |

Need more examples but could do something with column 3.

peterjc commented 3 years ago

Note as of v0.9.6 while we use only Peronosporales & Pythiales but this is done via a narrower NCBI search (#340), we still load the taxonomy tree from NCBI taxid 4762 Oomycota down (i.e. all Oomycetes).

Also, Pythium and Globisporangium are both Pythiales anyway.