igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
350 stars 52 forks source link

Load file from private Google Cloud bucket #67

Closed DevangThakkar closed 2 years ago

DevangThakkar commented 2 years ago

Hi,

I was wondering if it is possible to load a BAM file from a Google Cloud bucket. I tried loading a public BAM (example code with only the BAM location replaced) and that didn't seem to work. I understand that igv.js is able to load private Google cloud storage if we provide it with the requisite credentials - would it be possible to extend that to igv-reports as well?

> create_report test/data/variants/variants.vcf.gz \
http://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
--ideogram test/data/hg38/cytoBandIdeo.txt \
--flanking 1000 --info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--samples reads_1_fastq --sample-columns DP GQ \
--tracks test/data/variants/variants.vcf.gz gs://genomics-public-data/NA12878.chr20.sample.bam test/data/hg38/refGene.txt.gz \
--output examples/example_vcf.html

[E::hts_open_format] Failed to open file gs://genomics-public-data/NA12878.chr20.sample.bam
Traceback (most recent call last):
  File "/usr/local/bin/create_report", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 345, in main
    create_report(args)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 84, in create_report
    reader = utils.getreader(config, None, args.fasta)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/utils.py", line 13, in getreader
    return bam.BamReader(path)
  File "/usr/local/lib/python3.6/dist-packages/igv_reports/bam.py", line 11, in __init__
    header = pysam.view(*args)
  File "/usr/local/lib/python3.6/dist-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools view: failed to open "gs://genomics-public-data/NA12878.chr20.sample.bam" for reading: Protocol not supported\n'
jrobinso commented 2 years ago

@DevangThakkar Can you do this in python using pysam? If you can, igv-reports could be modified to do this. See the file igv_reports/bam.py, this is where alignments are read.

"gs" protocol will likely not be recognized by pysam, however the mapping of "gs" -> "https" protocol is a simple matter of parsing bucket and object name from the gs: url, then adding the parameter "alt=media" . In javascript this looks like

    `https://storage.googleapis.com/storage/v1/b/${bucket}/o/${object}?alt=media`
jrobinso commented 2 years ago

@DevangThakkar Have you had a chance to experiment with pysam? The gs -> https mapping is trivial, the challenge here is doing oAuth in python. I'm curious what you have in mind here for "passing credentials", you cannot of course just pass a username and password. Did you have in mind an access token? I'm not sure how you would do that securely.

DevangThakkar commented 2 years ago

Hi @jrobinso I was actually able to figure this out! Support for oAuth was added to htslib using an environment variable so igv-reports also works as long as that htslib is able to access the file (see #390). This issue can be closed.

jrobinso commented 2 years ago

Ahh yes, perfect.