igvteam / igv

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
https://igv.org
MIT License
646 stars 387 forks source link

Feature request: displaying GC content #1152

Open bricoletc opened 2 years ago

bricoletc commented 2 years ago

Thanks for the great tool!

Would it be possible/easy enough to add a track with the loaded genome's GC content? Would be useful to check for e.g. a correlation with read depth. Could not find mention of it in the manual.

(P.S this is available in artemis)

jrobinso commented 2 years ago

For which genome assembly? Many have a GC% track under "load from server...".

bricoletc commented 2 years ago

I was talking about any potentially loaded genome including own, non-server fasta- would it be unreasonable to compute GC% on the fly from the sequence if requested on the GUI?

jrobinso commented 2 years ago

I think its best to leave that to other tools, I'm sure there is a command line tool to do this. Then open the resulting track in IGV.

jrobinso commented 2 years ago

Leave the issue open for now, its an interesting idea, but isn't likely to get prioritized anytime soon.

bricoletc commented 2 years ago

Would you take in a PR implementing it? Good to know if you wouldn't as i wouldn't give it a go

jrobinso commented 2 years ago

Yes, but let's have some preliminary discussion first. How do you envision this working, from the user's perspective? I think it should be off by default, and enabled by user action or perhaps user preference. Where would it appear, and what would it look like? One possibility is to make it part of the sequence track itself, in the same way the 3 frame translation currently works. It could be displayed as a heatmap (rather than a bar chart) to keep this compact. Or that too could be user option.

Alternatively it could be a separate track, but that will be more complex I think as it will depend on the sequence track, which isn't always visible.

bricoletc commented 2 years ago

I think it should be off by default, and enabled by user action or perhaps user preference.

Definitely agree

One possibility is to make it part of the sequence track itself, in the same way the 3 frame translation currently works.

Also agree

It could be displayed as a heatmap (rather than a bar chart)

Returning to your first comment, which small server-available genome has an existing GC track that I could load into IGV? Want to check what it looks like.

jrobinso commented 2 years ago

I don't know if any "small" genomes have this track, human h38 does. Screenshot below, but I would not use this representation (bar chart) for what you are doing, I don't think it would fit well with the sequence track. I suggest using a gray scale. light->dark heatmap. A color heatmap will clash with the nucleotide colors.

Screen Shot 2022-06-16 at 8 38 44 AM

EDIT: something like this for scale

Screen Shot 2022-06-16 at 1 51 14 PM
jrobinso commented 2 years ago

I've spent a few minutes looking at the relevant classes, hopefully I can say you some time. In general I am thinking the 3-frame translation is a good model, as you are doing something similar.

First consideration is where to compute gc%, and where to cache it. We don't want to compute on every draw. I think we can compute in the class "SeqeunceTrack.SeqCache". This is where AA translations are done. The extent of the cache is expanded +/- a screen width, the expansion happens around lines 234-250 of SequenceTrack. So you can assume there is some buffer so recalculations are not done on every repaint. If more optimization of this is needed it can be done later.

So you might start by computing GC% in the SeqCache class, using refereshAminoAcids as a model. The gc% would be stored in SeqCache, perhaps as an object with start position, window size, and array of floats for gc%.

Having computed GC, you can add the draw method in the SequenceRenderer class, again using the 3-frame translation as a model.

The SequenceTrack "getHeight" method will need modified to account for the extra height of the gc% band. I think that's the only change needed for that track.

For the initial PR you can just assume this track is on all the time and not worry about the menu to enable / disable it. That can come later. If you want to add this look at "getPopupMenu" in SequenceTrack. Also initially a hard-coded window size could be used, we may or may not want to add an option to set this.

RE placement, I think the heatmap band would look best just above the sequence, and of the same height.

jrobinso commented 2 years ago

One more thing, normally the sequence track doesn't display (or load data) until zoomed in quite far, ~ 1 base per pixel. For GC content you might want a much larger visibility window. This is currently set as a preference, obtained as follow PreferencesManager.getPreferences().getAsInt(MAX_SEQUENCE_RESOLUTION). This is called from 3 places in SequenceTrack, search for usages of the contstant. We'll need to do something different if GC% is enabled, but initially you can just replace those calls with hard-coded values. A large value here will impact performance, I don't know what you had in mind but I wouldn't go larger than 1 MB.

bricoletc commented 2 years ago

Thanks @jrobinso i'll give this a go, and update here with any questions!

bricoletc commented 2 years ago

Hey @jrobinso just an update- i'm completing my thesis at the moment, so i am very unlikely to give this a go in the next 3 months. Still happy to give it a go afterwards, but if you'd like to do it yourself beforehand, np!