Open batson opened 2 years ago
Good catch, looks like the GenBank ID are coming from hits where a Serratus sequence was a centroid in the clustering that went into palmDB
.
One way or another it's necessary to to do a BLAST/DIAMOND search against nr
(instructions: https://github.com/ababaian/serratus/wiki/DIAMOND-nr) to deplete knowns as a filtering step. Also will catch errors where the virus has since been described (since Jan 2021) but after the snapshot that went into palmDB.
Updating GenBank accession per representative sOTU where any sequence in the cluster are in GenBank will be the fix for this. Keeping issue open as TODO
Describe the bug Some OTUs in the Orthomyxovirus tree have good BLAST hits but are not labelled with the corresponding Genbank species name.
For example, palmID_u19687 is Wellfleet Bay virus (100% sequence ID), which was submitted in 2018.
Compare u25189|Quaranfil quaranjavirus.
Screenshots