igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.
MIT License
347 stars 51 forks source link

HTML report too large #82

Closed abridgeland closed 1 year ago

abridgeland commented 1 year ago

Hi all, I have been using igv-report and I like it so far. However, I am having some issues with the html files being too large when using a bam file that is not subsetted. Any suggestions on how to solve this issue? We would like to incorporate this tool into our analysis pipeline and would like a report that shows the entire length of the bam file. Thanks!

jrobinso commented 1 year ago

If I understand you correctly you are trying to encode an entire BAM file as plain text HTML? How large is it?

jrobinso commented 1 year ago

I don't really understand your use case, could you describe it in a little more detail?

abridgeland commented 1 year ago

Yes, you are correct. It is about 40M

jrobinso commented 1 year ago

That's probably not possible, I think actually you have verified that. This is not what igv-reports is designed for, if you can describe in more detail what you are trying to do I might suggest another solution.

abridgeland commented 1 year ago

We would like a report that would be similar to the attached screenshot but would ideally display the entire bam. The goal would be to provide the scientists in charge of our studies either a file or a link where they can easily see the variant call results and the overall coverage and read mappings across the span of the reference. Not sure if modifying the code to host the report online instead of an html file would help remedy the issue? Or if it would be easier to find another tool.

igv_report_example
jrobinso commented 1 year ago

I think you're on the right track, you would need to host the BAM file and modify the report to read from it directly. This would be a substantial change to the python report generator, you basically would want to keep the upper table but replace the lower part with an embedded igv.js instance reading the BAM file directly from a server.

abridgeland commented 1 year ago

Great! Thank you for your insight and quick responses.

jrobinso commented 1 year ago

For this application I would suggest just embedding igv.js directly and creating your own table, you really don't need igv-reports unless you want to generate html for the table, which is not complicated.

jrobinso commented 1 year ago

If you're able to embed igv.js in a page and display your BAM file modifying igv-reports to generate html to work with this externally referenced BAM file should be pretty simple. Let me know if you are successful with the first step. You'll find lots of examples in the igv.js repository (https://github.com/igvteam/igv.js), but you might start with this simple page.

<!DOCTYPE html>
<html lang="en">
<head>
    <title>IGV</title>
</head>

<body>

<button id="log-state">Log Session</button>

<div id="igvDiv" style="padding-top: 50px;padding-bottom: 20px; height: auto"></div>

<script type="module">

    import igv from "https://cdn.jsdelivr.net/npm/igv@2.15.5/dist/igv.esm.min.js"

    const config =
        {
            genome: "hg19",
            locus:
                [
                    "chr1:155,153,822-155,155,105"
                ],
            tracks:
                [
                    {
                        type: "alignment",
                        url: "https://1000genomes.s3.amazonaws.com/phase3/data/NA19625/exome_alignment/NA19625.mapped.ILLUMINA.bwa.ASW.exome.20120522.bam",
                        indexURL: "https://1000genomes.s3.amazonaws.com/phase3/data/NA19625/exome_alignment/NA19625.mapped.ILLUMINA.bwa.ASW.exome.20120522.bam.bai",
                        name: "NA12878"
                    }
                ]
        }

    igv.createBrowser(document.getElementById('igvDiv'), config)
        .then(browser => {
            document.getElementById("log-state").addEventListener("click", () => console.log(browser.toJSON()))
        })

</script>
</body>
</html>
jrobinso commented 1 year ago

Hi @abridgeland , we have discussed this use case and think it might be useful to a broader community. So we are implementing a command line flag, tentatively called no-encode, which will create the tracks by reference to the supplied URLs. So the BAM and other tracks will not be encoded in the page, but referenced by URL, as shown in the igv.js example above. The html and table will be generated and function as they do now, with the exception that the report will of course be dependent on the BAM and other files it was created from.

To use this option your data files will need to be reachable by URL for your intended audience. More details later when the implementation is complete.

I am re-opening this as a marker for this new option.

abridgeland commented 1 year ago

Wow this is great news! Thank you.

jrobinso commented 1 year ago

You can test this now by installing from the development branch. It will be several weeks to a month before we release this, we have some other releases to manage first.

To test install from the development branch

pip install git+https://github.com/igvteam/igv-reports.git@noembed

For test data you can use the vcf file in test/data/varants.vcf.gz. Note the --no-embed flag.

create_report test/data/variants.vcf.gz --genome hg38 --no-embed  --tracks https://igv-genepattern-org.s3.amazonaws.com/test/reports/variants.vcf.gz https://igv-genepattern-org.s3.amazonaws.com/test/reports/recalibrated.bam --output example_noembed.html
abridgeland commented 1 year ago

Hello, I was able to successfully install the tool and run the test data but when I run my own data, I receive the errors below.
I used this command:

create_report http://10.121.64.3/snp100-x3.sorted.rg.mutect2.vcf.gz http://10.121.64.3/ref_seq.fasta --no-embed --tracks http://10.121.64.3/snp100-x3.sorted.rg.mutect2.vcf.gz http://10.121.64.3/snp100-x3.sorted_100_to_200.bam --output test_4_noembed.html

Access to XMLHttpRequest at 'http://10.121.64.3/ref_seq.fasta.fai' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. 10.121.64.3/ref_seq.fasta.fai:1 Failed to load resource: net::ERR_FAILED test_4_noembed.html:1 Uncaught (in promise) Error accessing resource: http://10.121.64.3/ref_seq.fasta.fai Status: 0 DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/circular-view.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/dom.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/igv-ui.css.map: System error: net::ERR_FILE_NOT_FOUND

jrobinso commented 1 year ago

You need to configure your data servers for "CORS" requests, there are some pointers here: https://github.com/igvteam/igv.js/wiki/Data-Server-Requirements.

You can ignore the "FILE_NOT_FOUND" for ".map" file errors.

jrobinso commented 1 year ago

An alternative for "CORS" errors is to serve your html page from the same host as the data (10.121.64.3 in this case). Cross origin security rules (CORS) come into play when a page from one host tries to access data on another.

I don't know what type of server 10.121.64.3 is, but enabling CORS is very common and a web search for your server type should help if our wiki doesn't. Basically you need to add CORS headers to the responses.

abridgeland commented 1 year ago

Hello, Enabling the CORS did resolve that error but we still are receiving a couple error messages. We are hosting the data through apache on a linux server.

featureFileReader.js:69 Warning: index file not specified. The entire vcf file will be loaded. wh @ featureFileReader.js:69 bamSource.js:66 Warning: no indexURL specified for http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam. Guessing undefined dd @ bamSource.js:66 trackViewport.js:217 Error accessing resource: http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam Status: 0 loadFeatures @ trackViewport.js:217 10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam:1

    Failed to load resource: net::ERR_CONTENT_DECODING_FAILED

trackViewport.js:217 Error accessing resource: http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam Status: 0 loadFeatures @ trackViewport.js:217 10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam:1

    Failed to load resource: net::ERR_CONTENT_DECODING_FAILED

DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/igv-ui.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/dom.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/circular-view.css.map: System error: net::ERR_FILE_NOT_FOUND

jrobinso commented 1 year ago

Could you zip up and attach the report file (html) that is produced. Also, the command line used to generate the report.

abridgeland commented 1 year ago

vc_test_0801.zip Here is the command line: create_report http://10.121.64.3/norm_call_mv.200925_VCtest_S2_M06717_FRT67N.vcf.gz http://10.121.64.3/ref_seq.fasta --no-embed --tracks http://10.121.64.3/norm_call_mv.200925_VCtest_S2_M06717_FRT67N.vcf.gz http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam --output vc_test_0801.html

jrobinso commented 1 year ago

Do index files for the vcf and bam exist, i.e. these files?

http://10.121.64.3/norm_call_mv.200925_VCtest_S2_M06717_FRT67N.vcf.gz.tbi
http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam.bai

You might need to use the --track-config option to specify these index files, instead of the --tracks option. The json file that is the argument to --track-config for your example is attached, assuming the index files exist.

Its hard to say if the BAM file error is caused by a missing index or other server problem, Apache servers sometimes incorrectly insert a "encoding: gzip" header into the response for BAM files. This can be fixed through configuration, scroll to the bottom of this page for the note on "Mime" type. However, before doing this try specifying the index with the --track-config option.

BTW the missing "map" file warnings can be ignored.

jrobinso commented 1 year ago

The tracks.json file for --track-config

tracks.json.zip

abridgeland commented 1 year ago

vc_test_with_config.zip Tried using the track.json and still received a similar error. I double checked and the files do exist.

create_report http://10.121.64.3/snp100-x3.sorted.rg.mutect2.vcf.gz http://10.121.64.3/ref_seq.fasta --no-embed --track-config tracks.json --output vc_test_with_config.html

 trackViewport.js:217 Error accessing resource: http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam Status: 0 10.121.64.3/alignmen…FRT67N.sorted.bam:1

Failed to load resource: net::ERR_CONTENT_DECODING_FAILED trackViewport.js:217 Error accessing resource: http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam Status: 0 10.121.64.3/alignmen…FRT67N.sorted.bam:1

Failed to load resource: net::ERR_CONTENT_DECODING_FAILED DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/igv-ui.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/dom.css.map: System error: net::ERR_FILE_NOT_FOUND DevTools failed to load source map: Could not load content for file:///C:/Users/M325560/Documents/projects/bam_visualization/circular-view.css.map: System error: net::ERR_FILE_NOT_FOUND trackViewport.js:217 Error accessing resource: http://10.121.64.3/alignment.200925_VCtest_S2_M06717_FRT67N.sorted.bam Status: 0 10.121.64.3/alignmen…FRT67N.sorted.bam:1

Failed to load resource: net::ERR_CONTENT_DECODING_FAILED

jrobinso commented 1 year ago

Did you configure the BAM Mime type in Apache? I suspect your server is improperly inserting "encoding: gzip" headers into the response, but that might not be the whole issue. The "net::ERR_CONTENT_DECODING_FAILED" messages is coming from your browser (Chrome ?), before IGV even sees the data.

I'm attaching my test case in an archive, the command line is is "readme.txt". Try that. To confirm its something related to the server I could host some test files for you on our server, but I don't think we would learn anything that the attached sample case won't tell us.

I think the next step is to configure the BAM mime type as described in the link I sent to prevent Apache from adding the encoding header.

issue_82.zip

jrobinso commented 1 year ago

BTW you can examine the response headers using Chrome's development tools on the network tab. A screenshot is below. You should not see "Content-Encoding: gzip" in the response headers for your BAM file.

Screen Shot 2023-08-04 at 10 06 37 AM
abridgeland commented 1 year ago

Hmmm I was using edge and the response header appears to be empty.
image

jrobinso commented 1 year ago

Well that is very bizzare, its not possible to have no response headers. Try Chrome and see if you see any. Were you able to run the example I sent you?

abridgeland commented 1 year ago

Strangely, I am receiving a CORS error on chrome even though they are configured. And yes I was able to run the example without any issue. I might try hosting the files through google cloud to see if I have more luck