Improved support for PacBio reads

amwenger commented 8 years ago

IGV is a very useful viewer for PacBio long read data, but it could be even better with a few modifications. Ideas to improve support for PacBio data are:

Add a "quick consensus" mode that only shows mismatches at positions with a consistent alternative variant. The coverage track has logic that uses an allele threshold to determine whether to show variation at a position. Add a "quick consensus" option and apply the same logic to mismatches within a read. This would greatly hide the "random errors" in long reads and make it possible to see haplotype structure by eye. It would look something like this:
Hide small indels at low zoom. Individual PacBio reads have random small deletions errors that show as a series of black dots at low zoom (see picture below). Hide deletions that occupy <3px of at the current zoom.
Label the size of large insertions and deletions. Modify the "Flag insertions large than N bases" option to be "Label insertions and deletions larger than N bases". "Large" insertions and deletions would show at all zoom levels. One idea for how to show the variant size is to use a filled upward pointing trapezoid for insertions and a hollow download pointing trapezoid for deletions:
Show clipping information at the end of reads. Read clipping can indicate the presence of a structural variant. Show the number of clipped bases in a "cap" at the end of reads:
Show variation at low zoom to enable viewing haplotype structure. I would prefer to see SNVs and large indels even when zoomed out to 100kb+.
Add a "group by SNV" feature to provide a "quick phasing" of reads. Right click on a position would provide the option to group reads based on the basepair at that position. By selecting a position with a heterozygous SNV, the reads could be "phased" into haplotypes:
Color basepairs based on interpulse distances, which indicate methylation status. The interpulse distance is provided in the "ip" SAM annotation. Instead of the standard "gray" background color, this would show a different shade at each basepair as a function of IPD.
Improve performance. Rendering PacBio alignments in IGV is often slow, and it is not practical currently to render read information at low zoom. One cause is that PacBio reads have frequent indel errors that break the alignment into many, many CIGAR blocks. Some of the rendering logic is performed per CIGAR block, not per alignment. Thus, rendering PacBio reads is much more expensive than the equivalent coverage in Illumina reads, which are often 1-2 blocks per alignment. For example, 40-fold coverage over 1kb would be something like 40 PacBio reads, each broken into ~100 CIGAR blocks for a total of 4,000 CIGAR blocks; it would require 400 Illumina reads, each broken into ~1 CIGAR block for a total of 400 blocks. So, the estimated cost is 10x higher for PacBio reads when operations are performed per block and not per alignment.
Add a generic "Send read to URL" feature that is like the "Blat read sequence" option but supports user-defined URLs. Some data representation (e.g. read v ref dotplot) is difficult to show within IGV but could be built as a separate web application. The user should be able to add new URLs and define which information is sent with a request: read name, read sequence, reference span, reference sequence, and CIGAR string, and others.
Color / shade basepairs based on percent identity with the reference sequence in a sliding window (say +/-10bp). This would serve as a simple empirical base QV score and would identify low-quality regions of a read.

I have a version of many of these changes in a personal fork. I am happy to clean them up and contribute them to the main project.

jrobinso commented 8 years ago

I don't understand (8), what do you mean by "operations are performed per block". Could you elaborate with cpu profiling data? The solution we applied earlier to this problem was to set a filter and ignore indels < some size, combining the adjacent blocks. I think the cpu cost is in the drawing operations, it doesn't matter if you loop through 40,000 blocks or 400 blocks if you draw the same elements you will incur the same cost. At least that is my recollection of previous profiling, some hard data is needed here and anywhere where performance is being discussed.

jrobinso commented 8 years ago

Which of these do you have solutions for in your personal fork?

pb-jchin commented 8 years ago

Hi, @jrobinso , for (9), you can see some code I modify from the code for blat here: https://github.com/pb-jchin/igv/blob/ExtView/src/org/broad/igv/util/extview/ExtendViewClient.java (We also have to add extra corresponding menu items in some other files.)

Ideally, if there is a generic mini language to send the meta-data/data of current view, selected read and selected features then we can pass the data to external viewer to fetch extra information that might need some other database backend for visualization without modifying the source code. If possible, the IGV can take the HTTP request returns to display it (SVG or PNG data, etc.). Or, the server can return a URL and IGV can initiate a web browser pointing to the URL, that will be great too. I have some example. If it is useful, I can make a screencast to show an example.

bnbowman commented 8 years ago

@jrobinso I also use IGV regularly to analyze PacBio raw reads, and would greatly appreciate the suggested changes - Particularly (1) and (2)

jrobinso commented 8 years ago

@pb-jchin @amwenger Do you guys have coded solutions for (1) and (2). If not I'm going to proceed with my own. In general I'd like to get your contributions merged within the next few weeks, I'm planning to do some restructuring and simplification of the Alignment model and merging later might be difficult.

amwenger commented 8 years ago

I do have solutions to 1 and 2 in https://github.com/amwenger/igv/tree/amw-pb-consensus-mode. I will prepare PRs soon.

jrobinso commented 8 years ago

@pb-jchin wrt (9), I can work with the code you have above to add a "Send read to URL" function, however I think we should nail down what can be returned from the post a little more tightly. I suggest we defer this one a bit and concentrate on some of the others as its easy to add this at any time. My time is really limited, and with 9 items we need to prioritize.

jrobinso commented 8 years ago

@cwhelan if you guys have any input on the PacBio improvements (see items 1-9 above) chime in.

pb-jchin commented 8 years ago

@jrobinso Yes, for (9), if one wants to be more general about the communication between IGV and external toolsets, it does need some thinking about the design.

Here is what I think:

for SAM/BAM reads, the metadata/data set is well defined, so it is easier.
for HTTP request return processing, we can consider multiple levels of support
1. IGV doesn't need to catch return, it is up to the server to ensure correct query are caught
2. IGV catches simple return to give user feedback that the HTTP is sent and display the server return message
3. IGV catches information-rich return, URL, IMAGES, etc., and process accordingly
for features, this is more complicated as BED / GFF, etc., can contain many fields that are not strictly defined. This indeed needs careful thinking.

For 1. and 2.i or 2.ii, this should be easy. It is the same as the BLAT request and we don't even need to process more complicated parsing for the returned information. For 2.iii and 3., yes, because their complexity, it should have lower priority comparing to the others.

jrobinso commented 8 years ago

Actually there is a lot of complex parsing of the server response from a blat request. The response to the post has to be handled, otherwise nothing will happen.

Do you have a working example in one of your branches?

pb-jchin commented 8 years ago

@jrobinso, yes. I mean you do need to parse the BLAT output. The easier thing to do is not to display the information inside IGV so IGV does not need to parsing the return info.

The attached screen shots show how I use it now. This will enable many related applications that needs more extensive backend database support and IGV will be the front-end for navigation.

On IGV side scr2016-07-14_11-03-09_am

On a web browser scr2016-07-14_11-05-25_am

jrobinso commented 8 years ago

So in this example the server is using some push technology, and IGV does not need to look at the response? How do you let IGV know the url, and syntax of the post body? The URL would probably be recorded in the prefs.properties file. Do you want to prepare a pull request for this one? We can pull it in and continue to refine it.

jrobinso commented 8 years ago

BTW ultimately I think we should support a structured response, probably json, that can encode either 2i, 2ii, or 2iii. The json could also encode any error or user messages the server wants to send. If the response is empty or not recognized IGV would do nothing, as in your "push" example.

pb-jchin commented 8 years ago

@jrobinso

The URL to the HTTP server is hard coded in the example now. Let me spend sometime understand how to pull information from prefs.properties and I will submit a cleaner PR after that. It will probably take about 1 week.

Yes, I use websocket to push update from the local server to the web page.

I think JSON is great for the response. Maybe something like this

minimum return

{"status": "OK|ERROR|others",  
 "msg":"some text message for IGV to show",  
 "payload": (other JSON objects for SVG/PNG/URL or instruction for IGV to display the objects etc.)}

jrobinso commented 8 years ago

You can leave it hardcoded for the PR, I will clean that up. If you want to use preferences the steps are

(1) add a key constant in PreferenceManager

(2) add your property to prefs.properties in the form =.
See you existing prefs.properties for examples

(3) access the property with PreferenceManager.getInstance().get(PreferenceManager.YOUR_KEY_CONSTANT)

On 7/14/16 1:38 PM, Jason Chin wrote:

@jrobinso https://github.com/jrobinso

The URL to the HTTP server is hard coded in the example now. Let me spend sometime understand how to pull information from prefs.properties and I will submit a cleaner PR after that. It will probably take about 1 week.

I think JSON for the response. Maybe something like this

minimum returen

|{"status": "OK|ERROR|others", "msg":"some text message for IGV to show", "payload": (other JSON objects for SVG/PNG/URL or instruction for IGV to display the objects etc.)} |

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/277#issuecomment-232785211, or mute the thread https://github.com/notifications/unsubscribe/AA49HJ4-Mc4xp3WhBk3Bt0awr8xbIbeIks5qVp5cgaJpZM4JKHGd.

cwhelan commented 8 years ago

@jrobinso I have a couple of other possible improvements for dealing with long read alignments (whether they be PacBio or assembled contigs from short reads or something). Specifically, I'm interested in piecing together the primary and supplementary alignments of a long sequence. Some of the improvements to the tool tips in https://github.com/igvteam/igv/issues/272 will be very helpful for this, ie the display of left and right clipping. What would also be useful might be the ability to:

1) Color all the primary and supplementary alignments of the same read the same color. Essentially this would be similar to color by -> read name, but we'd want the ability to only select a single read at a time, of course.

2) Some sort of functionality like: select a read -> right click -> "go to next/previous alignment in read". This would read the alternate mapping locations in the SA tag of the read, figure out which one represents the next aligned chunk of the read (sorted by coordinates on the read, not on the reference), and allow you to jump to it (or back to the previous).

Let me know if those aren't clear. @SHuang-Broad do you have any more suggestions for this type of thing?

jrobinso commented 8 years ago

@cwhelan that is clear. I also intended to extend the "linked read" view, from the 10x prototype feature, for chimeric ("supplementary") alignments. Would that be useful? It would be helpful to have some test data with chimeric reads (i.e supplementary reads with SA tags). Could you point me to some? You can email me directory on this.

SHuang-Broad commented 8 years ago

My $0.02:

Echoing @cwhelan 's coloring suggestion, I think the soft clipped bases could be displayed with a single color that will NOT be "mixing well" with the highlight color of reads/contigs.
Linking multiple alignment record for the same read/contig by color is good, but when there are many colors, the human eye is probably not going to be able to distinguish them. What could solve this problem, is to hove over one alignment record, and its peers (other alignment records for the same read/contig) "blinking/boxed" (if not displayable in the current view, have a pop-up?).
It might be useful to have a feature similar to the "Allele Fraction" information on the coverage track for reads. Here, we are not looking for SNP allele fractions, but rather "the fraction of reads that are clipped at the same (or almost the same) position on the ref". Or fraction of reads that have strange insert size, pair orientation.
I do feel that we are probably asking for a "SV mode" in IGV, because there are many features that are quite valuable offered by IGV when viewing short variants but gives less signal in SV mode. When taking the broader view, details can be forgiven sometimes.

jrobinso commented 8 years ago

@SHuang-Broad good suggestions. RE (2), when playing with the 10x view I found myself wanting exactly what you suggest, having all linked reads light up when one is moused over. The colors there are emulating the equivalent "Loupe" view, but I'm not sure its really useful. More than 5 or 6 colors is impossible to distinguish clearly.

I think an explicit "SV" mode might make sense, in this mode we might drop certain details (perhaps even the read sequence & snps) and jus emphasize SV information, at much wider genomic ranges than we typically do.

amwenger commented 8 years ago

Which of these do you have solutions for in your personal fork?

I have implementations for 1 (quick consensus), 2 (hide small indels), 3 (label large indels), 5 (variation at low zoom), 6 (group by SNV), and some ideas for 8 (performance) in my fork.

I don't understand (8), what do you mean by "operations are performed per block". Could you elaborate with cpu profiling data? The solution we applied earlier to this problem was to set a filter and ignore indels < some size, combining the adjacent blocks. I think the cpu cost is in the drawing operations, it doesn't matter if you loop through 40,000 blocks or 400 blocks if you draw the same elements you will incur the same cost. At least that is my recollection of previous profiling, some hard data is needed here and anywhere where performance is being discussed.

Sorry for the opaque comment. To elaborate: The drawBases() method in sam/AlignmentRenderer.java is called once for each alignment block. It does two expensive operations that could be moved higher up the call stack and performed once per alignment (or even better once per render):

Create a graphics context on which to draw the bases: Graphics2D g = (Graphics 2D) context.getGraphics().create();. The context could be created at a higher level and passed to drawBases(). This seems to be a very expensive operation.
Obtain the reference genome sequence against which to compare the read: genome.getSequence(chr, start, end). This currently allocates a new byte array, which while less expensive than drawing, is still a costly operation.

How do you recommend to run CPU profiling? I can do it if you point me to write tool. In this case, the difference is stark enough that you can see it (and hear it from the CPU fan). I posted a sample BAM and a video of scrolling through hg38 chr7:114,319,594-114,323,597 with that BAM using two versions of IGV. Left (the one that lags) is the current IGV with if (2 > 1) { return; } added immediately after Graphics2D g = (Graphics 2D) context.getGraphics().create(); in drawBases(). Right (the smooth one) has that if statement as the first line in drawBases().

In general, I think performance could be improved by creating fewer drawing contexts; perhaps they could be created and organized in a global singleton object.

amwenger commented 8 years ago

@cwhelan @SHuang-Broad

Specifically, I'm interested in piecing together the primary and supplementary alignments of a long sequence. Some of the improvements to the tool tips in #272 will be very helpful for this, ie the display of left and right clipping.

I think this is a great idea. Connecting primary and supplementary alignments of a read does make it dramatically easier to see structural variants. One caveat of which to be aware when connecting primary and supplementary alignments is that both of the alignments are extended locally to improve the alignment score. It is possible (and in fact common) that the primary and supplementary alignments reuse some of the same bases from the original read: primary suppl In this example, a simple visualization that connects the alignments would imply that the read supports a deletion of block C. In fact, it supports a deletion of blocks B and C. One way to handle that for cases of only two alignment blocks (one primary and one supplementary) is to highlight the bases that are reused.

It might be useful to have a feature similar to the "Allele Fraction" information on the coverage track for reads. Here, we are not looking for SNP allele fractions, but rather "the fraction of reads that are clipped at the same (or almost the same) position on the ref". Or fraction of reads that have strange insert size, pair orientation.

Interesting idea if we could define it properly. It is not too hard with Illumina reads, which should have sharp clipping boundaries. PacBio reads will require that the definition of the "same clipping" location be somewhat relaxed (e.g. +/- a few bp).

I think simply having the gold tips (idea 4 in the first post in the issue) on individual reads will help a lot. That will make it possible to see at a glance whether there are many clipped alignments in a window and whether the clipping is from one direction or both.

I do feel that we are probably asking for a "SV mode" in IGV, because there are many features that are quite valuable offered by IGV when viewing short variants but gives less signal in SV mode. When taking the broader view, details can be forgiven sometimes.

If it does not overwhelm, I think it is nice to maintain some of the small-scale information even at low zooms. In particular, it is nice to see haplotype structure and identify single nucleotide variants in/out of phase with structural variants. It would be hard to see that if structural variants were only visible at low zoom and single nucleotide variants were visible only at high zoom.

jrobinso commented 8 years ago

@amwenger Thanks for elaborating. In general graphics contexts are cached and reused, however not consistently. I agree this is an area that can be improved on. For profiling I use JProfiler, there are other tools, including some built into the JDK, but I don't know much about them.

cwhelan commented 8 years ago

@amwenger I absolutely agree on the complexity of overlapping supplementary alignments. There are also plenty of other weird cases -- for example two different aligned chunks of the long read can end up overlapping on the reference, indicating a duplication or tandem repeat expansion. @jrobinso In regards to this I've also been thinking about some sort of a popup for each long read that would display all the supplementary alignments together in context. Ideally this would be a visual representation like in @amwenger's picture above, but even a list similar to what's currently in the "BLAST read sequence" results popup would be helpful.

jrobinso commented 8 years ago

@amwenger @cwhelan A test bam with supplementary alignments, and a a list of region(s) containing some of the interesting cases (e.g. overlapping supplementary alignments, alignments sharing bases, etc) would be helpful. Actually essential to make progress. Thanks.

jrobinso commented 8 years ago

@amwenger I created a separate ticket for the performance issue #284

MattBashton commented 6 years ago

Re point 8 performance, with longer CIGAR strings (currently using MinION data) with 1.5kb-3kb reads performance is really poor, IGV just hangs at 100% CPU load for minutes on end before rending anything, I've only got a BAM with 5k reads too, just at high depth for a few select areas, hanging appears to be random - some times IGV works fine other times it fails about 50% of the time. I'm using version 2.4.5

MattBashton commented 6 years ago

Also because reads are not paired it's difficult to track down the secondary or split reads to investigate translocations etc. So some of the utility you had with paired end reads is now missing.

jrobinso commented 6 years ago

@MattBashton Could you possibly supply a test bam file to reproduce this problem? I'm not experiencing that with the PacBio test data I have, but then I don't have anything with deep coverage. Just a small slice around some deep coverage would probably suffice. Also, maybe open a new issue for the second issue raised. I think there is a tag we could use to restore some or all of the paired-end functionality (jump to mate / view mate in split screen). I need to investigate but open a new ticket and we'll continue from there.

MattBashton commented 6 years ago

A quick samtools view should reveal there about 4 main locations most of the reads fall in, swapping between those locations by pasting in the co-ordinates in to the search bar should trigger the issue as should panning around, the hang up appears to be a bit random, sometimes IGV is fine other times it gets stuck, but mostly occurs after viewing only a handful of locations. I produced these files via minimap2 then samtools 1.6 https://www.dropbox.com/s/jlowijdwjvt28x1/barcode01.bam?dl=0 https://www.dropbox.com/s/5i24fgt5kjyopze/barcode01.bam.bai?dl=0

jrobinso commented 6 years ago

Can you reproduce the issue with this example bam file? I can't so far. If you can produce it give me the genomic location or any other information that might be relevant. Also, look at igv.log in the igv folder (under user home) for stack traces, or just attach it here.

MattBashton commented 6 years ago

I'll try pin down a set of co-ordinates and operations, will also check logs for stack trace.

MattBashton commented 6 years ago

Ok I've now replicated this three times over.

I have set alignment downsampling off - this might be relevant!

Using I'm using Hg38 from IGVs own list, assuming the built in aliases handle my usage of GRCh38 from Ensembl as a ref here.

Jumpt to:

10:86078632

Zoom out twice, some time issue will trigger here, some times it won't. I think the issue might be with parsing the BAM.

Then jump to:

10:133667016

And again zoom out you should now have the spinning blue ball freeze if you've not got it from the first jump.

This is what I get in the log all the freezes are caused by the same execption:

INFO [2017-12-16 13:18:38,111] [Main.java:154]  Startup  IGV Version 2.4.5 12/14/2017 01:18 AM
INFO [2017-12-16 13:18:38,112] [Main.java:155]  Java 1.8.0_152
INFO [2017-12-16 13:18:38,112] [DirectoryManager.java:76]  Fetching user directory... 
INFO [2017-12-16 13:18:38,200] [Main.java:156]  Default User Directory: /Users/bashton
INFO [2017-12-16 13:18:38,201] [Main.java:157]  OS: Mac OS X
INFO [2017-12-16 13:18:49,444] [GenomeManager.java:182]  Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-16 13:18:52,987] [GenomeComboBox.java:79]  Enter genome combo box
INFO [2017-12-16 13:18:53,006] [GenomeManager.java:271]  Genome loaded.  id= hg38
INFO [2017-12-16 13:18:53,164] [CommandListener.java:120]  Listening on port 60151
INFO [2017-12-16 13:19:00,609] [IGV.java:1383]  Loading 1 resources.
INFO [2017-12-16 13:19:00,610] [TrackLoader.java:126]  Loading resource, path /Users/bashton/Dropbox/LRCG/Test_IGV_BAM/barcode01.bam
INFO [2017-12-16 13:19:05,265] [HttpUtils.java:873]  Range-byte request succeeded
ERROR [2017-12-16 13:19:43,830] [DataPanel.java:252]  Error: 
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.remove(ArrayList.java:496)
        at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
        at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
        at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
        at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
        at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        ... 3 more
INFO [2017-12-16 13:21:19,583] [ShutdownThread.java:47]  Shutting down
INFO [2017-12-16 13:21:19,608] [ShutdownThread.java:47]  Shutting down

MattBashton commented 6 years ago

Just to add I've now upgraded to minimap2.6 which appears to have slightly different SAM output (the header is now present and correct) however the same issue is occurs with IGV freezing up on 100% CPU usage after jumping to the second region, eventually after spamming the zoom out button I finally got IGV to render the region, so it looks like possibly the unresponsiveness can be rescued. These files can be found here:

https://www.dropbox.com/s/41ea1x4rpexsc5d/mm2.6_test_L.bam?dl=0 https://www.dropbox.com/s/maty2ntpr1hrvsj/mm2.6_test_L.bam.bai?dl=0

The error in the log is as before:


INFO [2017-12-18 10:22:36,852] [Main.java:155]  Java 1.8.0_151
INFO [2017-12-18 10:22:36,853] [DirectoryManager.java:76]  Fetching user directory... 
INFO [2017-12-18 10:22:36,951] [Main.java:156]  Default User Directory: /Users/bashton
INFO [2017-12-18 10:22:36,951] [Main.java:157]  OS: Mac OS X
INFO [2017-12-18 10:22:45,780] [GenomeManager.java:182]  Loading genome: /Users/bashton/igv/genomes/hg38.genome
INFO [2017-12-18 10:22:50,016] [GenomeComboBox.java:79]  Enter genome combo box
INFO [2017-12-18 10:22:50,035] [GenomeManager.java:271]  Genome loaded.  id= hg38
INFO [2017-12-18 10:22:50,162] [CommandListener.java:120]  Listening on port 60151
INFO [2017-12-18 10:23:01,457] [IGV.java:1383]  Loading 1 resources.
INFO [2017-12-18 10:23:01,458] [TrackLoader.java:126]  Loading resource, path /Users/bashton/Desktop/mm2.6_test_L.bam
INFO [2017-12-18 10:23:56,245] [HttpUtils.java:873]  Range-byte request succeeded
ERROR [2017-12-18 10:24:18,699] [DataPanel.java:252]  Error: 
java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.remove(ArrayList.java:496)
        at java.util.Collections$SynchronizedList.remove(Collections.java:2426)
        at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333)
        at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297)
        at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184)
        at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        ... 3 more```

The input files are small and I'm reading them from SSD.

jrobinso commented 6 years ago

OK, thanks for the investigative work. I will try again. Sorry for the delay, many things happening in parallel right now.

On Mon, Dec 18, 2017 at 2:30 AM, Matthew Bashton notifications@github.com wrote:

Just to add I've now upgraded to minimap2.6 which appears to have slightly different SAM output (the header is now present and correct) however the same issue is occurs with IGV freezing up on 100% CPU usage after jumping to the second region, eventually after spamming the zoom out button I finally got IGV to render the region, so it looks like possibly the unresponsiveness can be rescued. These files can be found here:

https://www.dropbox.com/s/41ea1x4rpexsc5d/mm2.6_test_L.bam?dl=0 https://www.dropbox.com/s/maty2ntpr1hrvsj/mm2.6_test_L.bam.bai?dl=0

The error in the log is as before:

INFO [2017-12-18 10:22:36,852] [Main.java:155] Java 1.8.0_151 INFO [2017-12-18 10:22:36,853] [DirectoryManager.java:76] Fetching user directory... INFO [2017-12-18 10:22:36,951] [Main.java:156] Default User Directory: /Users/bashton INFO [2017-12-18 10:22:36,951] [Main.java:157] OS: Mac OS X INFO [2017-12-18 10:22:45,780] [GenomeManager.java:182] Loading genome: /Users/bashton/igv/genomes/hg38.genome INFO [2017-12-18 10:22:50,016] [GenomeComboBox.java:79] Enter genome combo box INFO [2017-12-18 10:22:50,035] [GenomeManager.java:271] Genome loaded. id= hg38 INFO [2017-12-18 10:22:50,162] [CommandListener.java:120] Listening on port 60151 INFO [2017-12-18 10:23:01,457] [IGV.java:1383] Loading 1 resources. INFO [2017-12-18 10:23:01,458] [TrackLoader.java:126] Loading resource, path /Users/bashton/Desktop/mm2.6_test_L.bam INFO [2017-12-18 10:23:56,245] [HttpUtils.java:873] Range-byte request succeeded ERROR [2017-12-18 10:24:18,699] [DataPanel.java:252] Error: java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.remove(ArrayList.java:496) at java.util.Collections$SynchronizedList.remove(Collections.java:2426) at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333) at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297) at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184) at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) ... 3 more```

The input files are small and I'm reading them from SSD.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/277#issuecomment-352386994, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HOT0mQqqrhgtr-fEX3SgDwwT5EVYks5tBj7KgaJpZM4JKHGd .

jrobinso commented 6 years ago

BTW setting downsampling off will absolutely freeze IGV on very deep coverage, that's why it is implemented, but pacbio reads are not especially deep so ti should have no effect. But to clarify Is downsampling off for all of these test cases?

On Sat, Dec 16, 2017 at 5:29 AM, Matthew Bashton notifications@github.com wrote:

Ok I've now replicated this three times over.

I have set alignment downsampling off - this might be relevant!

Using I'm using Hg38 from IGVs own list, assuming the built in aliases handle my usage of GRCh38 from Ensembl as a ref here.

Jumpt to:

10:86078632

Zoom out twice, some time issue will trigger here, some times it won't. I think the issue might be with parsing the BAM.

Then jump to:

10:133667016

And again zoom out you should now have the spinning blue ball freeze if you've not got it from the first jump.

This is what I get in the log all the freezes are caused by the same execption:

INFO [2017-12-16 13:18:38,111] [Main.java:154] Startup IGV Version 2.4.5 12/14/2017 01:18 AM INFO [2017-12-16 13:18:38,112] [Main.java:155] Java 1.8.0_152 INFO [2017-12-16 13:18:38,112] [DirectoryManager.java:76] Fetching user directory... INFO [2017-12-16 13:18:38,200] [Main.java:156] Default User Directory: /Users/bashton INFO [2017-12-16 13:18:38,201] [Main.java:157] OS: Mac OS X INFO [2017-12-16 13:18:49,444] [GenomeManager.java:182] Loading genome: /Users/bashton/igv/genomes/hg38.genome INFO [2017-12-16 13:18:52,987] [GenomeComboBox.java:79] Enter genome combo box INFO [2017-12-16 13:18:53,006] [GenomeManager.java:271] Genome loaded. id= hg38 INFO [2017-12-16 13:18:53,164] [CommandListener.java:120] Listening on port 60151 INFO [2017-12-16 13:19:00,609] [IGV.java:1383] Loading 1 resources. INFO [2017-12-16 13:19:00,610] [TrackLoader.java:126] Loading resource, path /Users/bashton/Dropbox/LRCG/Test_IGV_BAM/barcode01.bam INFO [2017-12-16 13:19:05,265] [HttpUtils.java:873] Range-byte request succeeded ERROR [2017-12-16 13:19:43,830] [DataPanel.java:252] Error: java.util.concurrent.CompletionException: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.remove(ArrayList.java:496) at java.util.Collections$SynchronizedList.remove(Collections.java:2426) at org.broad.igv.sam.AlignmentDataManager.trimCache(AlignmentDataManager.java:333) at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:297) at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:184) at org.broad.igv.ui.panel.DataPanel.lambda$load$3(DataPanel.java:225) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) ... 3 more INFO [2017-12-16 13:21:19,583] [ShutdownThread.java:47] Shutting down INFO [2017-12-16 13:21:19,608] [ShutdownThread.java:47] Shutting down

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/277#issuecomment-352183645, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HCUbdBkoEuEnhleyZxqwHnGRduarks5tA8W0gaJpZM4JKHGd .

MattBashton commented 6 years ago

Hey thanks for getting back to me, yes downsampling is off for these test cases, however some regions are indeed deep owing to targeted nature of experiment, but no deeper than I normally use with illumina short reads were I have no issues with IGV. My JVM is 8GB and I'm not anywhere near the limit on that either if that helps.

winni2k commented 6 years ago

Hi all, sorry to post my bug report into this thread. A google search for igv java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 brought me here...

I am also observing a java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 error when I switch between reference contigs on some long read data (technically assembly contigs). I have tried igv 2.4.1 and 2.4.5 and the error seems to reproduce with different bam files. Happy to post a minimal input example if this is an unknown class of errors.

winni2k commented 6 years ago

It looks like my error is similar to #499

jrobinso commented 4 years ago

Hey all,this was opened as a discussion thread for which it was really useful, but there are many disparate issues here and so it remains perpetually open. I am going to close it, if there is a specific issue not addressed that you think should be please open an issue focused on that, along with steps to reproduce including test data if applicable.

igvteam / igv

Improved support for PacBio reads #277