BAM files size limitation

igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.

MIT License

347 stars 51 forks source link

BAM files size limitation #23

Closed binfyun closed 5 years ago

binfyun commented 5 years ago

Hi, It seems that there is a limitation on how many reads you can load in the html output? A couple tries, looks like:

Cannot display the tumor and its matched normal reads(beneath the tumor read window) at the same time?
Not all mutations' reads are displayed in the read window

Thanks

jrobinso commented 5 years ago

There's no limit that I'm aware of, and I don't understand your points 1 and 2. You can load as many tracks as you like (e.g. 1 for tumor and 1 for normal).

Please provide more details, and ideally a test case to reproduce problems you are having. Or at the minimum screenshots.

binfyun commented 5 years ago

Hi @jrobinso, thank you for the reply

So this is what I ran: "create_report 35439742-T_Matched_annovar.vcf_processed.vcf.gz Homo_sapiens_assembly19.fasta --ideogram cytoBand.txt --flanking 1000 --infoColumns GENE,COSMIC_ID --tracks 35439742-T_Matched_annovar.vcf_processed.vcf.gz,35439742-T_bc276_IMPACTv6-CLIN-20190168_L000_mrg_cl_aln_srt_MD_IR_BR.bam"

dummy vcf file attached: Matched_annovar.vcf_processed_subset.txt

The BAM was processed (post BQSR and index) size=3.6G <-the limitation I was referring to

igv_viewer.html was generated but without reads in the window: (screenshot attached)

Thanks again.

jrobinso commented 5 years ago

Its impossible for me to help with this without an example case I can run.

What happens if you load the bam into IGV desktop and go to the locations specified in your VCF?

jrobinso commented 5 years ago

BTW the bam file I'm testing on is ~300 GB

binfyun commented 5 years ago

Attached screenshot for KRAS 12:25398281 in IGV: Screen Shot 2019-03-20 at 12 34 30 AM

I had attached a subset of the variants ( Matched_annovar.vcf_processed_subset.txt ) I used from the above example. Would this be sufficient?

Thanks.

jrobinso commented 5 years ago

We're not communicating. I need a command line I can run, the VCF is not enough. If you can create an example you can zip up I will look into it. Include the command line you used in the zip, as a .txt or .sh file. First unzip it yourself and insure that it runs without error. Is that cleare enough? You can use samtools to extract a small portion of your bam for this.

jrobinso commented 5 years ago

BTW, I see reference to some broad filepaths in your VCF. I have access to the Broad filesystem if your files are there.

binfyun commented 5 years ago

If you have everything set up and ready to go, can't you just run the following command from your end? (ignore the ideogram and additional tracks for now, just provide reference genome and BAM file) create_report "the vcf subset i had attached" "your hg19 reference genome" --flanking 1000 --infoColumns GENE,COSMIC_ID --tracks "your BAM file"

jrobinso commented 5 years ago

Are there any errors in your console?

jrobinso commented 5 years ago

Are you sure your bam sequence names are "1, 2, 3,..." etc and not "chr1, chr2, chr3..."? They need to match the vcf sequence names.

jrobinso commented 5 years ago

The following runs without error and produces a report with alignments at every variant

create_report Matched_annovar.vcf_processed_subset.vcf https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/b37/human_g1k_v37.fasta --flanking 1000 --infoColumns GENE,COSMIC_ID --tracks http://1000genomes.s3.amazonaws.com/phase3/data/HG01879/alignment/HG01879.mapped.ILLUMINA.bwa.ACB.low_coverage.20120522.bam

BTW in the next release the infoColumns and tracks parameters are changing to be posix compliant, the above would be

create_report Matched_annovar.vcf_processed_subset.vcf https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/b37/human_g1k_v37.fasta --flanking 1000 --info-columns GENE COSMIC_ID --tracks http://1000genomes.s3.amazonaws.com/phase3/data/HG01879/alignment/HG01879.mapped.ILLUMINA.bwa.ACB.low_coverage.20120522.bam

binfyun commented 5 years ago

Are you sure your bam sequence names are "1, 2, 3,..." etc and not "chr1, chr2, chr3..."? They need to match the vcf sequence names.

Yes they are

jrobinso commented 5 years ago

So since your command works with the BAM noted above I don't know what to do with this without access to the BAM.

jrobinso commented 5 years ago

Here's the resulting report.
igvjs_viewer.html.zip

jrobinso commented 5 years ago

I'm going to close this since the answer to the original question is no, there is no bam file size limitation. If you are able to create a test case I can reproduce please open a new issue.