ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
38 stars 12 forks source link

IndexError: list index out of range #67

Closed mattbird567 closed 1 year ago

mattbird567 commented 1 year ago

Hi,

I'm trying to simulate some specific SNPS into my reference bacteria. I have included my vcf file (as a .txt as github won't let me upload a .vcf) which is responsible for causing the error, although i'm not sure why. I have identified that its something to do with my first entry in the .vcf file (The SNP with position: 2155168). I have triple checked to make sure the position, REF and ALT are all correct and as far as i can see they are so not sure why i am getting this error. Any help would be great.

python gen_reads.py -r TB_ref.fasta -R 147 -v TB_vcf.vcf --pe 300 30 -o test_data

Traceback (most recent call last):
  File "/mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/TB/programs/NEAT/gen_reads.py", line 896, in <module>
    main()
  File "/mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/TB/programs/NEAT/gen_reads.py", line 305, in main
    (sample_names, input_variants) = parse_vcf(input_vcf, ploidy=ploids)
  File "/mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/TB/programs/NEAT/source/vcf_func.py", line 105, in parse_vcf
    pl_out = parse_line(splt, col_dict, col_samp)
  File "/mnt/c/Users/Matt/Desktop/UKHSA/Projects/Current/TB/programs/NEAT/source/vcf_func.py", line 15, in parse_line
    reference_allele = vcf_line[col_dict['REF']]
IndexError: list index out of range

vcf.txt

joshfactorial commented 1 year ago

You'll need to make sure that the column are separated by tab characters, not spaces. That's the first thing that jumps out to me. Without tabs, it will not split the line correctly, leading to the index error.