igvteam / igv.js

Embeddable genomic visualization component based on the Integrative Genomics Viewer
MIT License
638 stars 225 forks source link

support for (bgzip+tabix) bedgraph files #320

Closed blajoie closed 7 years ago

blajoie commented 7 years ago

It seems as if tabix indexed bedgraph files are not currently supported/working...?

Loading in a uncompressed/unidexed bedgraph works AOK (but loads all data) https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph

Loading in a bgzip+tabix bedgraph seems to hang.. https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi

The bedgraph.gz loads AOK when type='annotation' - but a bed representation is not ideal. The load hangs when type is left absent (inferred as 'wig'), or throws an undefined type when explicitly set to 'bedgraph')

jrobinso commented 7 years ago

Hi, bedgraph tabix is not supported, bedgraph is a wig file so bigwig should be used for that. We could add support for tabixed bedgraph, and will, but it is inferior to bigwig because it won't have the zoom level data.

On Thu, Feb 9, 2017 at 10:17 PM, Bryan Lajoie notifications@github.com wrote:

It seems as if tabix indexed bedgraph files are not currently supported/working...?

Loading in a uncompressed/unidexed bedgraph works AOK (but loads all data) https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph

Loading in a bgzip+tabix bedgraph seems to hang.. https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi

The bedgraph.gz loads AOK when type='annotation' - but a bed representation is not ideal. The load hangs when type is left absent (inferred as 'wig'), or throws an undefined type when explicitly set to 'bedgraph')

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HF4FmxE3dC1Acpf8DWzBXyXq_qpVks5rbAD4gaJpZM4L9CtD .

blajoie commented 7 years ago

Hi Jim,

I absolutely agree with you, in almost all cases bigwig > wig/bedgraph.

I was hoping to leverage the bedgraph format to encode population level data for visualization. e.g. at each coordinate/bin I will have multiple distinct values.

I don't want these values to be aggregated (mean/median/iqr), I'd rather keep them all to view the population distribution (as points), hence why bigwig is out. Ideally something like bigwig would support multiple values per position, and take that into account during the zoom-level compute.

I simply wanted to add tabix indexing to this until we come up with a cleaner solution (e.g. some custom-service-api). For the time being I will limit the visibility window for these data tracks (since there are no precomputed big-wig-like zoom-levels)

p.s. I've also enabled plotting wigs as points/lines in igv.js. Happy to pass that back to you as a PR.

jrobinso commented 7 years ago

In IGV desktop the bigwig zoom levels are not used if the plot type == points, that's a possible solution. Supporting tabix for bedgraph is easy to add as well, just not there yet.

On Fri, Feb 10, 2017 at 10:42 AM, Bryan Lajoie notifications@github.com wrote:

Hi Jim,

I absolutely agree with you, in almost all cases bigwig > wig/bedgraph.

I was hoping to leverage the bedgraph format to encode population level data for visualization. e.g. at each coordinate/bin I will have multiple distinct values.

I don't want these values to be aggregated (mean/median/iqr), I'd rather keep them all to view the population distribution (as points), hence why bigwig is out. Ideally something like bigwig would support multiple values per position, and take that into account during the zoom-level pre-compute.

e.g. using uncompressed bedgraph https://s3-us-west-2.amazonaws.com/ilmn.igv-test/igv-bedgraph1.png

I simply wanted to add tabix indexing to this until we come up with a cleaner solution (e.g. some custom-service-api). For the time being I will limit the visibility window for these data tracks (since there are no precomputed big-wig-like zoom-levels)

p.s. I've also enabled plotting wigs as points/lines in igv.js. Happy to pass that back to you as a PR.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-279029587, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HA3dK7BwVNMkjTy3VoaYpVyYcSAvks5rbK-ygaJpZM4L9CtD .

blajoie commented 7 years ago

Interesting, I will play a bit with bigwig/points + IGV desktop. That could indeed be a possible workaround for this. Unless it is also easy to enable tabix for bedgraph (would save the bedgraph->bigwig transformation)

jrobinso commented 7 years ago

It is easy to add tabix for bedgraph, I just might not get to it right away.

On Fri, Feb 10, 2017 at 10:48 AM, Bryan Lajoie notifications@github.com wrote:

Interesting, I will play a bit with bigwig/points + IGV desktop. That could indeed be a possible workaround for this. Unless it is also easy to enable tabix for bedgraph (would save the bedgraph->bigwig transformation)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-279031273, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HOJN2APgpc9OCPF01ecptV2vVzeNks5rbLEBgaJpZM4L9CtD .

blajoie commented 7 years ago

no problem jim.

It looks like bigwig will not work for my use case (does not support overlapping regions). (Overlapping regions in bedGraph line 2 of test.bedgraph)

I don't see any other way to get this type of data into a standard genomic fileformat besides using bedgraph. No problem waiting for the feature add.

Thanks.

jrobinso commented 7 years ago

OK. Are overlapping features legal in bedgraph? Its good to know such files exist, or might exist.

On Fri, Feb 10, 2017 at 11:04 AM, Bryan Lajoie notifications@github.com wrote:

no problem jim.

It looks like bigwig will not work for my use case (does not support overlapping regions). (Overlapping regions in bedGraph line 2 of test.bedgraph)

I don't see any other way to get this type of data into a standard genomic fileformat besides using bedgraph. No problem waiting for the feature add.

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-279035596, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HCrw-x35mj1_tDLx83Jkp_TYj5p_ks5rbLS4gaJpZM4L9CtD .

blajoie commented 7 years ago

Technically no, I don't believe overlapping features are bg legal..

Perhaps something like the notion of a trackhub/multi wig from UCSC could be used. https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html#multiWig http://blog.openhelix.eu/wp-content/uploads/2014/05/original_announcement1-300x168.jpg (collection of bigwigs all plotted in the same track)

Though as far as I can tell IGV has no similar multi-track definition, correct?

jrobinso commented 7 years ago

IGV desktop has track overlays but not igv.js. I'm o.k. with illegal files, but some operations such as creating a "tdf" (equivalent to bigwig) might fail or be incorrect.

On Fri, Feb 10, 2017 at 11:28 AM, Bryan Lajoie notifications@github.com wrote:

Technically no, I don't believe overlapping features are bg legal..

Perhaps something like the notion of a trackhub/multi wig from UCSC could be used. https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html#multiWig (collection of bigwigs all plotted in the same track)

Though as far as I can tell IGV has no similar multi-track definition, correct?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-279042236, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HHqKSL9GCSoRuKLb65kICixD40lDks5rbLpvgaJpZM4L9CtD .

blajoie commented 7 years ago

yes absolutely. i am ok to use an illegal bg for the time being and then migrate to bigwig overlays once the feature is ported to igv.js (or leverage some sort of custom-service)

paul-shannon commented 7 years ago

Coming in on this late, maybe inaccurately: I find that bedgraph/wig bgzipped and tabixed works fine. Not knowing that it didn't :} I forged ahead and tried it last night, with this track configuration:

 {name: "brain hint fp",
     type: "wig",
     format: "bedgraph",
     min: 0,
     max: 10,
     color: "#AA0000",
     indexed: true,
     url: "http://pshannon.systemsbiology.net/annotations/brain_hint.bed.gz",
     indexURL: "http://pshannon.systemsbiology.net/annotations/brain_hint.bed.gz.tbi",
     indexed: true,
     },

I did discover that format: "bedGraph" does not work. But 'bedgraph' (all lower case) does.

jrobinso commented 7 years ago

Well, how about that! I can see how that might work, but its by accident. I'll leave this issue open until I can track down how/why this works, and insure that it continues to do so.

On Wed, Feb 15, 2017 at 1:12 PM, Paul Shannon notifications@github.com wrote:

Coming in on this late, maybe inaccurately: I find that bedgraph/wig bgzipped and tabixed works fine. Not knowing that it didn't :} I forged ahead and tried it last night, with this track configuration:

{name: "brain hint fp", type: "wig", format: "bedgraph", min: 0, max: 10, color: "#AA0000", indexed: true, url: "http://pshannon.systemsbiology.net/annotations/brain_hint.bed.gz", indexURL: "http://pshannon.systemsbiology.net/annotations/brain_hint.bed.gz.tbi", indexed: true, },

I did discover that format: "bedGraph" does not work. But 'bedgraph' (all lower case) does.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-280140195, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HNfJhUz2atIrYz-K_n6hJiLbNPaTks5rc2owgaJpZM4L9CtD .

blajoie commented 7 years ago

Hmm, this does not seem to work for me (current igv.js master). When leaving the type:'annotation', the bed.gz loads in AOK and leverages the tabix index, it however renders as a bed, not a wig.

If I set type:wig and format:bedgraph, then igv.js attempts to load the entire file and does not load an index into the featureParser obj. I will keep poking around a bit.

e.g.

format:"bedgraph"
height:250
indexURL:"<tbiurl>"
indexed:true
name:"test.bed.gz"
order:4
plotType:"rect"
sourceType:"file"
type:"wig"
url:"<bedurl>"
visibilityWindow:5000000
blajoie commented 7 years ago

In case this helps with testing:

(indexing does NOT work here, seems to hang?)

{
    'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz',
    'indexURL':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi',
    'type':'wig',
    'format':'bedgraph',
    'indexed':true,
    'name':'test-bg-tbi-aswig'
}

(indexing works, but loads as bed/annotation)

{
    'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz',
    'indexURL':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi',
    'type':'annotation',
    'indexed':true,
    'name':'test-bg-tbi-asbed'
}

(works with rawbg - loads as wig)

{
    'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph',
    'type':'wig',
    'format':'bedgraph',
    'name':'test-bg-aswig'
}
jrobinso commented 7 years ago

OK, thanks, we will "make it so" for option 1.

BTW you don't have to set the "indexed" property if supplying an indexURL. The purpose of that property is to tell igv.js is doesn't have to probe for an index (i.e. make an attempt to load a .tbi).

I suspect its hanging because its ignoring the index and trying to load the whole thing (assuming its large). Perhaps it appeared to work for Paul because his file was not so large.

On Thu, Feb 16, 2017 at 9:17 PM, Bryan Lajoie notifications@github.com wrote:

In case this helps with testing:

(indexing does NOT work here, seems to hang?)

{ 'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz', 'indexURL':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi', 'type':'wig', 'format':'bedgraph', 'indexed':true, 'name':'test-bg-tbi-aswig' }

(indexing works, but loads as bed/annotation)

{ 'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz', 'indexURL':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi', 'type':'annotation', 'indexed':true, 'name':'test-bg-tbi-asbed' }

(works with unindexes)

{ 'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph', 'type':'wig', 'format':'bedgraph', 'name':'test-bg-aswig' }

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-280555244, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HDSIdkvisoImn_xjLIK-tBv0QJbCks5rdS1UgaJpZM4L9CtD .

blajoie commented 7 years ago

Thanks for the quick feature-add Jim!

EDIT - everything is working as intended.

If visibilityWindow is set, then only the requested interval is loaded. If visibilityWindow is not set, then the entire chromosome is loaded regardless of starting interval.

Setting visibilityWindow allows large bedGraph to be be used.

cheers!

jrobinso commented 7 years ago

Visibility window should control this, will look into it.

On Fri, Feb 17, 2017 at 11:59 PM, Bryan Lajoie notifications@github.com wrote:

Thanks for the quick feature-add Jim.

I noticed that igv.js seems to be loading the entire-chromosomes data (instead of just the interval).

I assume there is a good reason for this - is this just assuming that most legal bed/bedgraph will be small in size and grabbing the entire chromosome once, enables faster panning/zooming?

Lines 185-192 of featureSource.js seem to be controlling this. Simply commenting our the genomicInterval expansion solves the issue for me.

// TODO -- reuse cached features that overelap new region

            if (self.sourceType === 'file' && (self.visibilityWindow === undefined || self.visibilityWindow <= 0)) {
                // Expand genomic interval to grab entire chromosome
                //genomicInterval.start = 0;
                var chromosome = igv.browser.genome.getChromosome(chr);
                //genomicInterval.end = (chromosome === undefined ? Number.MAX_VALUE : chromosome.bpLength);
                console.log(chr+","+genomicInterval.start+","+genomicInterval.end);
                console.log(genomicInterval);
            }

My intended use-case as described above would entail using large bedgraphs (>5gb) with >100mil points. Given the reoslution, it never makes sense to draw too large of an interval - so I am setting the visibilityWindow to 5MB. In the future we will move to multi-bigwig-layers.

Any side effect to making the above changes to featureSource.js?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-280829882, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HEj9N3FsdFvYggpinmqMA0SJxr2Wks5rdqTrgaJpZM4L9CtD .

blajoie commented 7 years ago

Jim - I think this is actually AOK. My initial testing did not have a visibilityWindow set, hence why it was loading the entire chr. Seems to be working exactly as intended now - my mistake!

blajoie commented 7 years ago

Jim - one comment on this.

Even if visibilityWindow is not set for a given track, it may not always be prudent to extent the request to the entire chromosome.

e.g. say I have a large bed file (50mb). Should a user deem necessary, I would allow them to visualize the entire-chromosomes data, even though the request/drawing will be slow. However the average user will only ever view small intervals.

In this case, I don't necessarily want to pull down the entire chromosomes data every time.

Would it not make sense to modify featureSource.js to never extend the request to the whole chr? (or build in another config.param to only allow this logic for user-defined tracks (e.g. of small filesize)

jrobinso commented 7 years ago

Its difficult to set a general rule, maybe there should be a default but what should it be? The simplest thing is to make visibilityWindow a required property, the creator of the config should know the appropriate value for the track.

On Mon, Feb 20, 2017 at 2:14 PM, Bryan Lajoie notifications@github.com wrote:

Jim - one comment on this.

Even if visibilityWindow is not set for a given track, it may not always be prudent to extent the request to the entire chromosome.

e.g. say I have a large bed file (50mb). Should a user deem necessary, I would allow them to visualize the entire-chromosomes data, even the request/drawing will be slow. However, for the average user, they will only ever view small intervals.

In this case, I don't necessarily want to pull down the entire chromosomes data every time.

Would it not make sense to modify featureSource.js to never extend the request to the whole chr? (or build in another config.param to only allow this logic for user-defined tracks (e.g. of small filesize)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-281194447, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HI9LjB3CJoTVYTgVCyatWkQ5h71iks5rehBagaJpZM4L9CtD .

blajoie commented 7 years ago

Agreed, it should be required, and the creator should know the appropriate limit for the track. Right now, a visibilityWindow of 0 and -1 are treated the same, correct?

What about something like:

visibilityWindow: -1 allow entire chromosome, always pull down the full chromosome (extend all requests)

visibilityWindow: 0 allow entire chromosome, do not extend requests (only serve current genomic interval)

visibilityWindow: N (>0) allow request up to N bp.

jrobinso commented 7 years ago

Oh, I think I missed the subtlety about extending the request. Is that the main issue? Yes that's a bit of a bug actually, or a feature to get around a bug. The "tribble" index isn't fully implemented and can only read a whole chromosome. That was meant to be temporary but got forgotten.

So when that's settled the extension will not happen, you could open a new issue on this with your suggestion to keep it visible.

On Mon, Feb 20, 2017 at 10:28 PM, Bryan Lajoie notifications@github.com wrote:

Agreed, it should be required, and the creator should know the appropriate limit for the track. Right now, a visibilityWindow of 0 and -1 are treated the same, correct?

What about something like:

visibilityWindow: -1 allow entire chromosome, always pull down the full chromosome (extend all requests)

visibilityWindow: 0 allow entire chromosome, do not extend requests (only serve current genomic interval)

visibilityWindow: N (>0) allow request up to N bp.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-281256667, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HLVsIsvjzs7pPu0yZ5Ra8puHaxl2ks5reoP-gaJpZM4L9CtD .

jimhavrilla commented 7 years ago

did you really fix this issue? if I run: { url: 'http://home.chpc.utah.edu/~u1021864/test.bedGraph.gz', indexURL: 'http://home.chpc.utah.edu/~u1021864/test.bedGraph.gz.tbi', name: 'CCRs', format: "bedgraph", type: "wig" } It still does not work for me.

Running 1.0.9.

jrobinso commented 7 years ago

Fix what issue, and what is not workin? You'll need to be more descriptive, this is a long thread with a number of related issues discussed.

jimhavrilla commented 7 years ago

The issue referenced by blajoie above when he said:

In case this helps with testing:

(indexing does NOT work here, seems to hang?)

{ 'url':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz', 'indexURL':'https://s3-us-west-2.amazonaws.com/ilmn.igv-test/test.bedgraph.gz.tbi', 'type':'wig', 'format':'bedgraph', 'indexed':true, 'name':'test-bg-tbi-aswig' }

When I add my earlier example to the tracks field of the igvDiv it just crashes, it does not work. If I use an unindexed bedGraph it works fine after a while of loading but I want it to be faster and index it with tabix. Did you push a fix to bedGraph indexing so that indexing doesn't hang and works rapidly?

To clarify the working version for the track (unindexed) works like this: { url: 'http://home.chpc.utah.edu/~u1021864/test.bedGraph', name: 'CCRs' }

jimhavrilla commented 7 years ago

You mentioned you would "make it so" for option 1 (fixing bedGraph tabix indexing using wig type) I'm wondering if you did and maybe it's not on the latest web-release I got somehow?

jrobinso commented 7 years ago

That's in the master branch. Try the "beta" urls if you are not pulling and building yourself.

Jim

On Thu, Sep 7, 2017 at 3:15 PM, Jim Havrilla notifications@github.com wrote:

You mentioned you would "make it so" for that option that the indexing would work. I'm wondering if you did and maybe it's not on the latest web-release I got somehow?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-327941290, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HF7Vc26sdI3cDW0m_UeyAz624VE8ks5sgGsbgaJpZM4L9CtD .

jimhavrilla commented 7 years ago

igv-beta.js or igv-all.js? neither seems to even create the browser for me. I guess I can try building myself.

jrobinso commented 7 years ago

igv-beta.js or igv-beta.min.js. Please define "not working..." Are you getting an error, don't see any features, or something else? If you can't resolve it I will probably need a test file to try to reproduce the problem myself.

On Thu, Sep 7, 2017 at 3:47 PM, Jim Havrilla notifications@github.com wrote:

igv-beta.js or igv-all.js? neither seems to work for me. I guess I can try building myself.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/igvteam/igv.js/issues/320#issuecomment-327947712, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HH4nHg0kMVJn1bWI-ZtfyqvA7Izfks5sgHJrgaJpZM4L9CtD .

jimhavrilla commented 7 years ago

I see no features.

jimhavrilla commented 7 years ago

image ^version 1.0.9 (doesn't load CCRs track, crashes webpage trying to index) image

^igv-beta.js or igv-beta.min.js

If I use a non-indexed bedGraph (version 1.0.9) image

jimhavrilla commented 7 years ago

to clarify I am using

Githubissues.
  • Githubissues is a development platform for aggregating issues.