igvteam / igv

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
https://igv.org
MIT License
625 stars 378 forks source link

Errors when scrolling CRAM files loaded from GCloud #933

Closed gevro closed 3 years ago

gevro commented 3 years ago

Hi, Issue #884 is still happening in IGV 2.9.3. See attached.

To replicate, load a CRAM from GCloud and scroll several times using the mouse to the left or right.

Screen Shot 2021-03-12 at 4 28 52 PM Screen Shot 2021-03-12 at 4 28 48 PM
jrobinso commented 3 years ago

I can't reproduce this, could you share or point me to a CRAM that exhibits this problem, and also reference genome used and the the exact genomic region where you see it (particularly the second).

The second message is informational, it is telling you that the reference sequence you are using does not match the reference sequence the CRAM was aligned against. This can happen in certain regions even, for exampke, with different fastas based on GRCh38 due to different treatment of some regions. For example use of "N" vs ambiguity codes. To view a CRAM you need to use the fasta used to produce the CRAM for a reference.

gevro commented 3 years ago

Just replicated with this public NA12878 CRAM (hg38): gs://broad-public-datasets/NA12878/NA12878.cram

I started at the gene DONSON. You need to quickly scroll left and right 30-40 times. I did something like 7-8 times in the left direction and 7-8 times in the right direction. It seems random when it throws the error. Not a particular location.

gevro commented 3 years ago

Note, you may need to scroll 50-100 times to see the error. And sometimes it takes > 100 scrolls to see it. See below another output of the error.

Screen Shot 2021-03-13 at 11 58 28 AM

Based on the randomness of when the error happens, I suspect an issue in the data transfer from GCloud.

jrobinso commented 3 years ago

@gevro Have you verified that the fasta you are using for a reference is the fasta used to align that cram? What fasta are you using (or what genome have you selected in IGV). That is a 1KG genome, so you will for sure see that error at certain regions of the genome if you select "hg38" as your reference.

jrobinso commented 3 years ago

For most 1KG genomes, or I should say some 1KG genomes, selecting this reference in IGV will work best. However, again, you need the actual fasta to decode 100% of the regions in a CRAM file.

Screen Shot 2021-03-14 at 11 09 44 AM
gevro commented 3 years ago

Error still happens even with the hg38 gatk reference. See below. I also downloaded the above CRAM (gs://broad-public-datasets/NA12878/NA12878.cram) and confirmed it was aligned to hg38 gatk reference (/cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta).

Note that this error occurs anywhere in the genome after you do the left and right scrolling that I described.

Screen Shot 2021-03-14 at 5 25 06 PM Screen Shot 2021-03-14 at 5 25 10 PM
jrobinso commented 3 years ago

I just can't reproduce it. Could you attach your igv.log file? Also, do you have some reason to suspect GCloud has anything to do with the error?

gevro commented 3 years ago

Hi, The problem is definitely with GCloud CRAM. I cannot replicate the problem when I download the CRAM locally. But when I load the CRAM from GCloud, it is easy to replicate.

To replicate: 1) Load this file gs://broad-public-datasets/NA12878/NA12878.cram from GCloud with either hg38 or hg38-gatk 2) I am using Mac OS, IGV 2.9.3 3) Go to the gene locus DONSON. 4) Scroll quickly either left or right 5-20 times, alternate directions, or keep going in one direction, and keep doing this until the error occurs. It is random when it occurs. Sometimes quickly, sometimes takes 100-200 scrolls to happen.

igv.log file attached igv-1.log

jrobinso commented 3 years ago

Hi,

I tried to reproduce this by going to gene DONSON and scrolling left and right for several minutes. So I wrote a script to do the scrolling automatically and let it run through 100 different loci, again no erros. I don't doubt you are seeing this issue but I suspect its local, possibly a network error. I'm hopeful we can find it in your log. Unfortunately not in the log you sent.

The log indicates you are using version 2.8.13, which had a logging statement that shouldn't be there and is filling the log with uninformative statements. This looks like an old log, it is named "igv-1.log", it should be "igv.log". The last use date appears to be 11-20-2020. Please check the date of the log and send "igv.log". Even better delete the log, reproduce the issue, exit IGV and send the newly created one.

By GCloud I meant GCloud specficially. Just because you don't see it locally does not implicate GCloud, it could be network loading generally.

gevro commented 3 years ago

Log attached. I think these are the relevant lines. I agree it could be a network issue, but transport layer should be able to detect data corruption in transit, and we're on a robust internet connection.

java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:126) ~[htsjdk-2.23.0.jar:?]
INFO [2021-03-12T15:37:30,479] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.structure.CompressionHeader.<init>(CompressionHeader.java:90) ~[htsjdk-2.23.0.jar:?]
java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:126) ~[htsjdk-2.23.0.jar:?]
INFO [2021-03-12T15:37:31,072] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Input does not have a valid CRAM header.
java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:126) ~[htsjdk-2.23.0.jar:?]
INFO [2021-03-12T16:28:42,720] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Input does not have a valid CRAM header.
java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:126) ~[htsjdk-2.23.0.jar:?]
    at htsjdk.samtools.cram.structure.CompressionHeader.<init>(CompressionHeader.java:90) ~[htsjdk-2.23.0.jar:?]
INFO [2021-03-15T10:34:22,278] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Input does not have a valid CRAM header.
java.lang.RuntimeException: Input does not have a valid CRAM header.
    at htsjdk.samtools.cram.build.CramIO.readCramHeader(CramIO.java:126) ~[htsjdk-2.23.0.jar:?]
    at htsjdk.samtools.cram.structure.CompressionHeader.<init>(CompressionHeader.java:90) ~[htsjdk-2.23.0.jar:?]
INFO [2021-03-15T10:41:01,321] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Input does not have a valid CRAM header.
java.lang.RuntimeException: Block CRC32 mismatch, actual: 1e17998 expected: e17998b3
INFO [2021-03-12T15:37:31,073] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Block CRC32 mismatch, actual: 1e17998 expected: e17998b3
java.lang.RuntimeException: Block CRC32 mismatch, actual: 45ce7bd expected: 5ce7bd2d
INFO [2021-03-12T16:28:42,720] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Block CRC32 mismatch, actual: 45ce7bd expected: 5ce7bd2d
java.lang.RuntimeException: Block CRC32 mismatch, actual: 47983 expected: 71d15181
INFO [2021-03-12T16:28:54,191] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Block CRC32 mismatch, actual: 47983 expected: 71d15181
java.lang.RuntimeException: Block CRC32 mismatch, actual: 200dbb1 expected: 700141d4
INFO [2021-03-15T10:34:22,278] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Block CRC32 mismatch, actual: 200dbb1 expected: 700141d4
java.lang.RuntimeException: Block CRC32 mismatch, actual: 200caf0 expected: f4c060cf
INFO [2021-03-15T10:41:01,321] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.RuntimeException: Block CRC32 mismatch, actual: 200caf0 expected: f4c060cf

igv.log

jrobinso commented 3 years ago

Its not a question of data corruption, more likely the Google server returning something unexpected. For example its not uncommon for Google servers to return error messages in responses with status code 200. I wonder if you are hitting some rate quota limit, however if that were the case I should be hitting it as well.

Is it possible for you to test in another network environment, for example at home or elsewhere?

gevro commented 3 years ago

I just tested at home and it happened. This rules out a network issue and likely narrows down to an issue with Gcloud data transfer or in the receipt of data by IGV from GCloud. Note that to replicate it it can be erratic and sometimes it happens quickly and other times take 2-3 minutes of scrolling left and right for it to happen.

Screen Shot 2021-03-15 at 10 34 55 PM
jrobinso commented 3 years ago

Thanks for the report, I agree that eliminates the network environment. Did you use the same or a different computer?

I have tried repeatedly, even wrote a script to hit hundreds of loci in a very short time period. I doubt very much its GCloud. IGV is launched many thousands of time a day and a substantial % of that usage is loading bam and cram files from GCloud. So far this is an isolated report, when there are general issues we hear about them quickly. However I don't know how many people pan to different locations hundreds of times, possibly not many. Anyway if I can ever reproduce it I will be sure to investigate.

gevro commented 3 years ago

Ok thanks. I will update if I figure out any easier way to replicate.

jrobinso commented 3 years ago

@gevro It might be interesting to try another tool, such as Bamview if its still available or "IGB" to see if the problem occurs there. Or even igv-web (https://igv.org/app).

gevro commented 3 years ago

It doesn't happen on igv.org.

I noticed also that clicking on the chromosome ideogram to jump to a new location and then continue to scroll left/right rapidly seems to speed up occurrence of the bug.

Also, it seems that loading more tracks speeds up the bug.

For example, loading 6 copies of this: gs://broad-public-datasets/NA12878/NA12878.cram Then scrolling 20-30 times in one direction, jumping to a new location on the chromosome by clicking on the ideogram and scrolling again, jumping to a new location and scrolling again...

jrobinso commented 3 years ago

OK I'll try that. Jumping around like that while data is still loading will stress the system, perhaps I'm not being aggressive enough.

jrobinso commented 3 years ago

@gevro If you can give the snapshot build a try there are some changes that might fix this. No guarantees, I give it a 20% chance. Its a bit more complicated if you are on a mac, you have to download the build for "all platforms" and run from the command line: https://software.broadinstitute.org/software/igv/download_snapshot

I still can't make it happen, but my colleague did once. She can't however repeat it, so far. It might come down to the speed at which you can physically pan vs the speed of you computer, or some other combination of factors. I have a very fast computer and slow fingers. At any rate give the snapshot a try if you can.

gevro commented 3 years ago

The snapshot build loads alignments a bit slower when I scroll, but I cannot replicate the issue there.

When will the new snapshot become an official release?

jrobinso commented 3 years ago

@gevro The fix was to sychronize loading of the reference sequence to decode CRAMs. As you know CRAM files do not contain reference. Because of how the htsjdk implements the spec we need to load the sequence for an entire chromosome to decode any alignments on that chromosome. I conjectured that there was some kind of thread issue corrupting that load if attempted from different threads, which could be tiggered by rapid panning before the initial load completed.

I will see if there is a way to protect this without synchronization now that you have confirmed that this is the source of the issue, and will probably ask you to test again. When we are happy with a solution it can be released fairly quickly.

jrobinso commented 3 years ago

@gevro Are you using "multi locus" view? You didn't mention it but that would be an important consideration.

jrobinso commented 3 years ago

@gevro Timing with a stop watch (crude I know), I don't see any significant difference in speed. Have you compared speed between the two versions? I would not want to release a slower IGV unless there was no alternative.

jrobinso commented 3 years ago

@gevro I just pushed a change that should help with the speed, while maintaining the previous fix. If you can pull and test the snapshot and confirm it is still fixed we will schedule a release. Thanks for all your assistance.

gevro commented 3 years ago

Awesome you found the problem! The new snapshot also works fine. It is perhaps a tiny bit slower in loading when scrolling left/right than 2.9.3. But not by much. And no, I was not using 'multi locus view'. Thank you!!!

jrobinso commented 3 years ago

This has been released as 2.9.4. Here's hoping its fixed.