theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
36 stars 17 forks source link

[TheiaCoV_FASTA_Batch] FASTA header not matching sample name #261

Closed sage-wright closed 8 months ago

sage-wright commented 9 months ago

:bug:

:pencil: Describe the Issue

Nextclade output uses the FASTA header line to separate results. These headers don't always match the name of the FASTA file which is what is used. Because of this results don't get properly populated to the Terra table. No workflow failure occurs.

:repeat: How to Reproduce

:fishing_pole_and_fish: Expected Behavior

Either (a) rename the fasta header with samplename, or (b) translate fasta header to samplename elsewhere

:floppy_disk: Version Information

:information_source: Additional Information

cimendes commented 8 months ago

Test with current version on main on a set of 5 GISAID sequences with mismatch headers and sample_ids: https://job-manager.dsde-prod.broadinstitute.org/jobs/05f84977-c2fe-4e0f-9395-a8bf1ebd26c3

It was a failure as expected