This is a sequence ingested from NCBI. In NCBI it has accession KU182912.1 and isolate name "Sudan virus/H. sapiens-tc/SDN/2000/Gulu-200011676". In this build of Ebolavirus Sudan sequences it clearly clusters with other sequences from the Gulu, Uganda outbreak of 2000.
The NCBI record, however, states geo_loc_name="Sudan". This is certainly incorrect. The sequence was deposited in 2015, many years after the outbreak, and the authors likely made a mistake with the metadata. All attempts to contact the original sequence authors have, thus far, failed.
Evidence of the problem
The below phylogeny shows that Ebolavirus Sudan has two clades, each of which is restricted to a single country (Uganda and South Sudan).
The sequence in question is labeled Gulu-200011676 in the phylogeny.
There was no Ebolavirus Sudan outbreak in (South) Sudan in 2000, the year listed as collection date for this sequence in Genbank. See list of Ebola outbreaks from US CDC.
I agree with Peter's suggested change, based on tree topology and the US CDC not listing any Ebola outbreaks outside of Uganda in 2000. Note that this sequence now has accession PP_00000UB.3 listed in Pathoplexus.
Describe the possible issue
This is a sequence ingested from NCBI. In NCBI it has accession KU182912.1 and isolate name "Sudan virus/H. sapiens-tc/SDN/2000/Gulu-200011676". In this build of Ebolavirus Sudan sequences it clearly clusters with other sequences from the Gulu, Uganda outbreak of 2000.
The NCBI record, however, states
geo_loc_name="Sudan"
. This is certainly incorrect. The sequence was deposited in 2015, many years after the outbreak, and the authors likely made a mistake with the metadata. All attempts to contact the original sequence authors have, thus far, failed.Evidence of the problem
The below phylogeny shows that Ebolavirus Sudan has two clades, each of which is restricted to a single country (Uganda and South Sudan).
The sequence in question is labeled Gulu-200011676 in the phylogeny.
There was no Ebolavirus Sudan outbreak in (South) Sudan in 2000, the year listed as collection date for this sequence in Genbank. See list of Ebola outbreaks from US CDC.
Suggested change
The geo_loc_country should be changed to Uganda.
Full list of affected sequences
PP_00000UB.2