pathoplexus / curation_reports

Curation reports for Pathoplexus
4 stars 0 forks source link

Correct geo_loc_name for PP_00000UB.2 (Ebolavirus Sudan) #2

Open pvanheus opened 1 week ago

pvanheus commented 1 week ago

Describe the possible issue

This is a sequence ingested from NCBI. In NCBI it has accession KU182912.1 and isolate name "Sudan virus/H. sapiens-tc/SDN/2000/Gulu-200011676". In this build of Ebolavirus Sudan sequences it clearly clusters with other sequences from the Gulu, Uganda outbreak of 2000.

The NCBI record, however, states geo_loc_name="Sudan". This is certainly incorrect. The sequence was deposited in 2015, many years after the outbreak, and the authors likely made a mistake with the metadata. All attempts to contact the original sequence authors have, thus far, failed.

Evidence of the problem

The below phylogeny shows that Ebolavirus Sudan has two clades, each of which is restricted to a single country (Uganda and South Sudan).

image

The sequence in question is labeled Gulu-200011676 in the phylogeny.

There was no Ebolavirus Sudan outbreak in (South) Sudan in 2000, the year listed as collection date for this sequence in Genbank. See list of Ebola outbreaks from US CDC.

Suggested change

The geo_loc_country should be changed to Uganda.

Full list of affected sequences

PP_00000UB.2

emily-smith1 commented 21 hours ago

I agree with Peter's suggested change, based on tree topology and the US CDC not listing any Ebola outbreaks outside of Uganda in 2000. Note that this sequence now has accession PP_00000UB.3 listed in Pathoplexus.