igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
349 stars 52 forks source link

ValueError: start out of range (-261) #51

Closed ctuni closed 3 years ago

ctuni commented 3 years ago

Hi!

I'm trying to use igv-reports to visualize .vcf files of SARS-CoV2. I'm running into an error which I think arises from the fact that the genome of the SARS-CoV2 virus does not have chromosomes. I tried the minimal command create_report sample1.vcf.gz sarscov2.fa (where sample1.vcf.gz is the compressed VCF file, and sarscov2.fa is the genome in FASTA format), and the error output I have recievied is the following:

Traceback (most recent call last):
  File "/home/ctuni/miniconda3/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/home/ctuni/miniconda3/lib/python3.8/site-packages/igv_reports/fasta.py", line 23, in get_data
    slice_seq = fasta.fetch(chr, start, end)
  File "pysam/cfaidx.pyx", line 266, in pysam.cfaidx.FastaFile.fetch
  File "pysam/cutils.pyx", line 221, in pysam.cutils.parse_region
ValueError: start out of range (-261)

I am sure I am doing something wrong but I can't see what it is. I someone could direct me in the right direction it would be appreciated, thank you very much!

jrobinso commented 3 years ago

@ctuni is the fasta file indexed (do you have a ".fai" file)? If you could zip and attach the fasta file I will see if I can reproduce the error, also show us the command line you used. The lack of chromosomes should not matter, technically its not chromosomes but sequences as defined in the fasta that are queried.

ctuni commented 3 years ago

Hi @jrobinso , thank you for your help!

I see that in the first message I did not use the .fai file, instead I used the .fa file, which I understand that is the cause of the first error message. I have both the .fa and the .fai file which I created using samtools faidx sarscov2.fa. I have also used tabix to index the .vcf.gz file I have, resulting in a .vcf.gz.tbi (created with the command tabix -f -p vcf sample.vcf.gz) I have tried a series of combinations of file extensions, and I think that the correct one is the following:

create_report sample.vcf.gz.tbi sarscov2.fa.fai, and I recieved the error:

Traceback (most recent call last):
  File "/home/ctuni/anaconda3/envs/myenv/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/home/ctuni/anaconda3/envs/myenv/lib/python3.7/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/ctuni/anaconda3/envs/myenv/lib/python3.7/site-packages/igv_reports/report.py", line 26, in create_report
    table_json = table.to_JSON()
UnboundLocalError: local variable 'table' referenced before assignment

Again, thank you very much for your help, and sorry if this problem I have should not be reported here, and for any trouble I might be causing! sarscov_genomes.tar.gz

jrobinso commented 3 years ago

@ctuni Thanks for the report, it is helpful to get these bug reports. Is it possible to share the sample.vcf.gz file as well? Or really all I need to know is the locations of the first few variants.

ctuni commented 3 years ago

@jrobinso Of course! Here it is: sample.vcf.gz

Thank you again for your help!

jrobinso commented 3 years ago

@ctuni Thanks for the test data. I can confirm this is an igv-reports bug, it has actually been fixed but not released. It will be released and pushed to pypi by the end of the week. I will leave this open until that time.

In the meantime, you can work around the problem by limiting the "flanking window" to be no greater than the position of the first variant. In this example case that is position 241, so the following should work

create_report sarscov_genomes/sample.vcf.gz sarscov_genomes/sarscov2.fa --flanking 240 --tracks sarscov_genomes/sample.vcf.gz
ctuni commented 3 years ago

@jrobinso thank you for your help! It did not occur to me to try that workaround but I can confirm it works perfectly!

Thank you again for your invaluable help!

charlesfoster commented 3 years ago

Hi @jrobinso,

I'm having the same issues. I'm also working with SARS-CoV-2. My command:

create_report sample.vcf.bgz NC_045512.fasta

Output:

Traceback (most recent call last):
  File "/home/cfos/miniconda3/envs/igv_reports/bin/create_report", line 10, in <module>
    sys.exit(main())
  File "/home/cfos/miniconda3/envs/igv_reports/lib/python3.7/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/cfos/miniconda3/envs/igv_reports/lib/python3.7/site-packages/igv_reports/report.py", line 26, in create_report
    table_json = table.to_JSON()
UnboundLocalError: local variable 'table' referenced before assignment

The fasta file is indexed with samtools faidx, and the vcf file is compressed with bgzip and indexed with tabix. The first variant is at position 14, but adding --flanking 13 gives me the same error. As a test, I deleted all the beginning variants, only leaving the variant at position 241. I then used create_report sample.vcf.bgz NC_045512.fasta --flanking 240, and all finished well.

Any ideas on how to fix the problem when variants occur earlier in the sequence? If it helps, I installed the program from anaconda/miniconda via mamba install igv-reports.

jrobinso commented 3 years ago

@charlesfoster Apologies I had forgotten to update the release. Could you try install igv-reports again, and verify that the version is 1.0.3. I know nothing about "mamba" or miniconda for that matter, but the latest version in pypi is 1.0.3 and should have a fix for the orginally posted issue. Whether that is your issue or not I can't be sure.

charlesfoster commented 3 years ago

@jrobinso thanks for the quick reply. I uninstalled the version hosted by Anaconda (https://anaconda.org/bioconda/igv-reports), then reinstalled using pip. The version is 1.0.3:

image

When I use the modified sample where the first variant is at position 241, the program finishes with no issue even without specifying the flanking parameter:

create_report sample.vcf.gz NC_045512.fasta --tracks sample.vcf.gz sample.primertrim.sorted.bam

However, when I use the original vcf file with a variant at position 14, the program fails with the aforementioned 'UnboundLocalError: local variable 'table' referenced before assignment'.

Here is the problematic vcf file: sample_with_early_variant.vcf.gz

The reference is just a fasta file downloaded from: https://www.ncbi.nlm.nih.gov/nuccore/1798174254

Thanks!

edit: one other thing that might be worth adding into the readme is that the input VCF file has to end with either '.vcf' or '.vcf.gz'. I was just getting the 'UnboundLocalError: local variable 'table' referenced before assignment' error with another sample, despite it not having any 'early' variants, until I realised after reading report.py that the '.vcf.bgz' extension was causing the problem.

jrobinso commented 3 years ago

OK thanks for the test data. I will look into this tomorrow, looks like another bug.

yeemey commented 3 years ago

@charlesfoster Apologies I had forgotten to update the release. Could you try install igv-reports again, and verify that the version is 1.0.3. I know nothing about "mamba" or miniconda for that matter, but the latest version in pypi is 1.0.3 and should have a fix for the orginally posted issue. Whether that is your issue or not I can't be sure.

Thanks for the fix! I ran into the same issue as OP on v.1.0.2, and pip install igv-reports --upgrade to 1.0.3 resolved it.

jrobinso commented 3 years ago

This is a long and rambling thread but I think the issue is resolved. I added a note about the ".bgz" extension, this is not something I control its imposed by the "pysam" module.

BTW, after releasing 1.0.3 I noticed 2 issues with igv.js, (1) the cytobands weren't rendered correctly from igv-reports generated html, and (2) the center line was not exactly centered on the variant. Both of these were fixed in igv.js release 2.8.6 and I re-released igv-reports to use the latest igv.js. (1.0.4). You don't actually have to update igv-reports, you can just change the script include for igv.js in the variant template (or generated reports) to be

    <script src="https://cdn.jsdelivr.net/npm/igv@2.8.6/dist/igv.min.js"></script>