igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
347 stars 51 forks source link

Not able to find reference on s3.amazon #45

Closed DanielAndreasen closed 3 years ago

DanielAndreasen commented 3 years ago

When I run any of the two examples I get the following error:

create_report examples/junctions/Introns.38.bed \
    https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
    --type junction \
    --ideogram examples/junctions/cytoBandIdeo.txt \
    --output junctions.html \
    --track-config examples/junctions/tracks.json \
    --info-columns TCGA GTEx variant_name \
    --title "Sample A"

Traceback (most recent call last):
  File "/home/daniel/miniconda3/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/home/daniel/miniconda3/lib/python3.8/site-packages/igv_reports/fasta.py", line 21, in get_data
    fasta = pysam.FastaFile(fasta_file)
  File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file `https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa` not found
helgathorv commented 3 years ago

Are you able to access the hg38.fa file if you directly enter the link into your web browser?

DanielAndreasen commented 3 years ago

Yes I am. I just tried running the command again, with the same error. I'm running from the newest master branch.

helgathorv commented 3 years ago

That is strange. And there's nothing about the environment where you're running create_reports that would prevent access to the file at Amazon? Since you do have access to the file via the browser, one workaround would be to download it and just use the local version.

jrobinso commented 3 years ago

In that case I suggest you download the fasta (and its associated index) and use them as local files. I can't reproduce your problem here. To run the example you referenced download https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa and https://s3.dualstack.us-east-1.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai. Save them locally and use the full path to the fasta file.

On Tue, Oct 6, 2020 at 11:47 PM Daniel Thaagaard Andreasen < notifications@github.com> wrote:

Yes I am. I just tried running the command again, with the same error. I'm running from the newest master branch.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv-reports/issues/45#issuecomment-704732155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHD2HCQP273WRXK3P3UPLTSJQFHDANCNFSM4SF3LZOQ .

jrobinso commented 3 years ago

The other thing you might try, python being python, is to create and install to a fresh environment. You might have an old version of pysam that is not recognizing urls.

I'm closing this as it can't be reproduced.

DanielAndreasen commented 3 years ago

Sorry to see this issue closed so fast. I tried making a new conda environment, installed pip, and then igv-reports. I'm using python 3.8.6, and pysam 0.16.0.1 (the latest version as of right now), and still it doesn't work.

However, if I download the reference genome and its index I can make it run.

Just out of curiousity, which versions of python and pysam are you using?

jrobinso commented 3 years ago

I'll re-open if you describe a problem I can reproduce. Closing just takes it off my active list, I'm not meaning to preclude discussion and questions. I use 3.7.2 with this project.

DanielAndreasen commented 3 years ago

And which version of pysam do you use?

jrobinso commented 3 years ago

We have tried 0.15.3 and 0.16.0.1

stevekm commented 3 years ago

I am getting the same error. You can replicate it with this Dockerfile;

FROM continuumio/miniconda3:4.5.4

RUN conda install bioconda::pysam==0.15.3

# need to install igv-reports from Git because the pip version is out dated and lacks some critical bug fixes; https://github.com/igvteam/igv-reports/issues/47
RUN git clone https://github.com/igvteam/igv-reports.git && \
    cd igv-reports && \
    git checkout 7e12305 && \
    pip install -r requirements.txt && \
    python setup.py install

ADD test.sh /test.sh

with this test script test.sh;

#!/bin/bash
set -x 
create_report \
/igv-reports/examples/variants/variants.vcf.gz \
https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
--ideogram examples/variants/cytoBandIdeo.txt \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks \
/igv-reports/examples/variants/variants.vcf.gz \
/igv-reports/examples/variants/recalibrated.bam \
/igv-reports/examples/variants/refGene.sort.bed.gz \
--output igvjs_viewer.test.html

Running it;

$ docker run --rm -it igv-reports-1.0.1 bash
root@729dc29e270b:/# ./test.sh
+ create_report /igv-reports/examples/variants/variants.vcf.gz https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa --ideogram examples/variants/cytoBandIdeo.txt --flanking 1000 --info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC --tracks /igv-reports/examples/variants/variants.vcf.gz /igv-reports/examples/variants/recalibrated.bam /igv-reports/examples/variants/refGene.sort.bed.gz --output igvjs_viewer.test.html
Traceback (most recent call last):
  File "/opt/conda/bin/create_report", line 33, in <module>
    sys.exit(load_entry_point('igv-reports==1.0.1', 'console_scripts', 'create_report')())
  File "/opt/conda/lib/python3.7/site-packages/igv_reports-1.0.1-py3.7.egg/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/opt/conda/lib/python3.7/site-packages/igv_reports-1.0.1-py3.7.egg/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/opt/conda/lib/python3.7/site-packages/igv_reports-1.0.1-py3.7.egg/igv_reports/fasta.py", line 21, in get_data
    fasta = pysam.FastaFile(fasta_file)
  File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file `https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa` not found

note that I am using a git clone of the repo in there and not installing from pip for the reason mentioned in the comments.

stevekm commented 3 years ago

I also get the same error if I install igv-reports from pip;

Dockerfile;

FROM continuumio/miniconda3:4.5.4
RUN conda install bioconda::pysam==0.15.3 conda-forge::unzip
RUN pip install igv-reports
RUN wget https://s3.amazonaws.com/igv.org.test/reports/examples.zip && unzip examples.zip
ADD test.sh /test.sh

test.sh

#!/bin/bash
set -x

create_report \
/examples/variants/variants.vcf.gz \
https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
--ideogram examples/variants/cytoBandIdeo.txt \
--flanking 1000 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks \
/examples/variants/variants.vcf.gz \
/examples/variants/recalibrated.bam \
/examples/variants/refGene.sort.bed.gz \
--output igvjs_viewer.test.html

result;

$ docker run --rm -it igv-reports-1.0.1 bash
root@342a8774e36b:/# ./test.sh
+ create_report /examples/variants/variants.vcf.gz https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa --ideogram examples/variants/cytoBandIdeo.txt --flanking 1000 --info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC --tracks /examples/variants/variants.vcf.gz /examples/variants/recalibrated.bam /examples/variants/refGene.sort.bed.gz --output igvjs_viewer.test.html
Traceback (most recent call last):
  File "/opt/conda/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/igv_reports/report.py", line 234, in main
    create_report(args)
  File "/opt/conda/lib/python3.7/site-packages/igv_reports/report.py", line 87, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/opt/conda/lib/python3.7/site-packages/igv_reports/fasta.py", line 21, in get_data
    fasta = pysam.FastaFile(fasta_file)
  File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file `https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa` not found

root@342a8774e36b:/# ls -l /examples/variants/variants.vcf.gz
-rw-r--r-- 1 root root 7329 Jun 15  2020 /examples/variants/variants.vcf.gz

root@342a8774e36b:/# pip freeze
certifi==2020.12.5
chardet==4.0.0
idna==2.10
igv-reports==1.0.1
intervaltree==3.1.0
pysam==0.15.3
requests==2.25.1
sortedcontainers==2.3.0
urllib3==1.26.3
jrobinso commented 3 years ago

@stevekm OK I will look into it. The examples should work of course, but as a workaround you can download that fasta and reference it as a local file.

stevekm commented 3 years ago

Yes I tried that and it works. The issue is that I want to be able to include a test script like this with my container builds in order to test that its working, in which case its very helpful to be able to load the reference genome from the URL as shown. Hope there's a solution possible :)

jrobinso commented 3 years ago

@stevekm I will try to get to this next week, this is not a very active project compared to IGV and igv.js, plus its a different language (python), I will have to clear some time. However I do recall trying to reproduce this before without success. I do not use Docker, I don't know what affect that would have but its the common thread between the OP and your report.

stevekm commented 3 years ago

on a side note, I found out that my conda installation in those Dockerfiles was slightly wrong, it should be this;

RUN conda install python=3.6.5 bioconda::pysam==0.15.3

to avoid the error described here where upgrading the base Python version breaks conda; https://stackoverflow.com/questions/19825250/after-anaconda-installation-conda-command-fails-with-importerror-no-module-na

this does not change the error with pysam seen here but does get in the way of trying to debug it

jrobinso commented 3 years ago

@stevekm I just pushed 1.0.2, in response to an earlier report from you that the PIP package was not in sync with github. There is some possibility, albeit slight, that this might resolve this issue.

jrobinso commented 3 years ago

Looking at the stack track the error is in pysam, which is trying to open the URL as a local file

File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file `https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa` not found

Which in turn indicates an error in the htslib "isremote" function. I do not know why you would see this error, I do not see it with a clean pip install, but it has something to do with the pysam dependency.

jrobinso commented 3 years ago

Per notes above, this error was not reproducible but could have been caused by incorrect or out-of-date files pushed to pypi as part of release 1.0.1.

yeemey commented 2 years ago

I ran into the same error with a Singularity image of igv-reports, with additional error information:

$ singularity exec -B $PWD singularity/igv-reports-1.0.4.sif create_report igv-reports/examples/variants/variants.vcf.gz https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa
[E::easy_errno] Libcurl reported error 77 (Problem with the SSL CA cert (path? access rights?))
[E::fai_load3_core] Failed to open FASTA index https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa.fai: Input/output error
Traceback (most recent call last):
  File "/usr/local/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 271, in main
    create_report(args)
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/report.py", line 105, in create_report
    data = fasta.get_data(args.fasta, region)
  File "/usr/local/lib/python3.9/dist-packages/igv_reports/fasta.py", line 21, in get_data
    fasta = pysam.FastaFile(fasta_file)
  File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 183, in pysam.libcfaidx.FastaFile._open
OSError: error when opening file `https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa`

I'm using pysam 0.18.0

jrobinso commented 2 years ago

This looks like a libcurl bug affecting SSL certificates, there is nothing wrong with the certificate for that file. A workaround is to use an http url (instead of https). Specifically

http://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa

I don't know anything about singularity, but here is a Colab notebook with that example (working). I will update the readme to use http: https://colab.research.google.com/drive/1JJvyDm0r_Lyhmuk27zEwfkE0J1gV0wzp?usp=sharing

yeemey commented 2 years ago

Thank you, this workaround solved my issue.