nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

ValueError: Fasta file appears to have sequences of different lengths! #1079

Closed tibitoy closed 1 year ago

tibitoy commented 1 year ago

Trying to run nextstrain build . --configfile with my genomic surveillance yaml but got the following error pops up after it runs for a bit:

Traceback (most recent call last):
  File "/home4/tibitoy/ncov/scripts/get_distance_to_focal_set.py", line 174, in <module>
    context_seqs_dict = calculate_snp_matrix(seqs, consensus=ref, chunk_size=chunk_size)
  File "/home4/tibitoy/ncov/scripts/get_distance_to_focal_set.py", line 73, in calculate_snp_matrix
    raise ValueError('Fasta file appears to have sequences of different lengths!')
ValueError: Fasta file appears to have sequences of different lengths!

For reference, I am using the typical reference data/metadata (https://data.nextstrain.org/files/ncov/open/reference) and my background data is for North America (ncov_north-america.tar.gz). Not sure why this is happening. Any suggestions? YAML file is in my public repository under genomic-surveillance.yaml if helpful.

victorlin commented 1 year ago

Hi @tibitoy,

I don't see a public repository under your GitHub account with a genomic-surveillance.yaml file. Can you provide a link?

I can't reproduce the issue with today's reference FASTA/metadata, so I'll need some more information to help you here.

tibitoy commented 1 year ago

Thanks for your response! I've made it public now.

On Wed, Aug 9, 2023 at 2:56 PM Victor Lin @.***> wrote:

Hi @tibitoy https://github.com/tibitoy,

I don't see a public repository under your GitHub account with a genomic-surveillance.yaml file. Can you provide a link?

I can't reproduce the issue with today's reference FASTA https://data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz /metadata https://data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz, so I'll need some more information to help you here.

— Reply to this email directly, view it on GitHub https://github.com/nextstrain/ncov/issues/1079#issuecomment-1671968923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZXUFLHANOWPFLNNRZICVL3XUPMPXANCNFSM6AAAAAA3HPBA5M . You are receiving this because you were mentioned.Message ID: @.***>

-- Temitope Ibitoye PhD Student, Environmental, Water Resources, and Coastal Engineering North Carolina State University

joverlee521 commented 1 year ago

Hi @tibitoy, thanks for making your config file public.

Looking at your genomic-surveillance.yaml file, I think you might need to change line 10 from

aligned: data/ncov_north-america.tar.gz

to

sequences: data/ncov_north-america.tar.gz

I'm assuming that you were following the Nextstrain guide on how to get contextual sequences. I believe these contextual sequences from GISAID are not aligned which is why they are causing the error.

victorlin commented 1 year ago

Closing due to inactivity. @tibitoy please re-open if this is still an issue for you.