virusseq / portal-ui

Canadian VirusSeq Data Portal
https://virusseq-dataportal.ca/
GNU Affero General Public License v3.0
8 stars 8 forks source link

Data problem: sequence without metadata #424

Closed scottcain closed 8 months ago

scottcain commented 9 months ago

At some point (possibly associated with the migration, not sure) this problem came up recently. There are about 150 sequences in the fasta but doesn't have associated metadata in the database. The list of sequences was identified by Karen Fang at DNAStack and information about the sequences was provided by Molly Pratt at PHAC. Her most recent comment about these sequences was:

Since these are older samples (I believe they were uploaded by Nithu in 2021) I don't have access to the 
original upload file, but I was able to pull information from our database for the fasta headers above. Note 
that the formatting of virus names have been updated for some samples- in the attached file they appear 
as they currently exist in GISAID. Also, some samples have been removed from GISAID and have `NULL` 
Accessions, these could probably removed from the VirusSeq Portal as well.

and the csv file is attached. ab_virusseq_2021_IDlinkage.csv

scottcain commented 9 months ago

We are reasonably sure this was fixed. A typo was introduced into the data store a few weeks ago causing this issue and that has been rectified as of today.

leoraba commented 8 months ago

Closing issue.