serratus-bio / serratus.io

Front-end code for Serratus project website
https://serratus.io
GNU Affero General Public License v3.0
11 stars 11 forks source link

`trees and alignments` genbank species labels for some OTUs missing #197

Open batson opened 2 years ago

batson commented 2 years ago

Describe the bug Some OTUs in the Orthomyxovirus tree have good BLAST hits but are not labelled with the corresponding Genbank species name.

For example, palmID_u19687 is Wellfleet Bay virus (100% sequence ID), which was submitted in 2018.

Compare u25189|Quaranfil quaranjavirus.

Screenshots

Screen Shot 2022-09-14 at 9 46 26 AM Screen Shot 2022-09-14 at 9 53 21 AM
ababaian commented 2 years ago

Good catch, looks like the GenBank ID are coming from hits where a Serratus sequence was a centroid in the clustering that went into palmDB.

One way or another it's necessary to to do a BLAST/DIAMOND search against nr (instructions: https://github.com/ababaian/serratus/wiki/DIAMOND-nr) to deplete knowns as a filtering step. Also will catch errors where the virus has since been described (since Jan 2021) but after the snapshot that went into palmDB.

Updating GenBank accession per representative sOTU where any sequence in the cluster are in GenBank will be the fix for this. Keeping issue open as TODO