Open bricoletc opened 2 years ago
For which genome assembly? Many have a GC% track under "load from server...".
I was talking about any potentially loaded genome including own, non-server fasta- would it be unreasonable to compute GC% on the fly from the sequence if requested on the GUI?
I think its best to leave that to other tools, I'm sure there is a command line tool to do this. Then open the resulting track in IGV.
Leave the issue open for now, its an interesting idea, but isn't likely to get prioritized anytime soon.
Would you take in a PR implementing it? Good to know if you wouldn't as i wouldn't give it a go
Yes, but let's have some preliminary discussion first. How do you envision this working, from the user's perspective? I think it should be off by default, and enabled by user action or perhaps user preference. Where would it appear, and what would it look like? One possibility is to make it part of the sequence track itself, in the same way the 3 frame translation currently works. It could be displayed as a heatmap (rather than a bar chart) to keep this compact. Or that too could be user option.
Alternatively it could be a separate track, but that will be more complex I think as it will depend on the sequence track, which isn't always visible.
I think it should be off by default, and enabled by user action or perhaps user preference.
Definitely agree
One possibility is to make it part of the sequence track itself, in the same way the 3 frame translation currently works.
Also agree
It could be displayed as a heatmap (rather than a bar chart)
Returning to your first comment, which small server-available genome has an existing GC track that I could load into IGV? Want to check what it looks like.
I don't know if any "small" genomes have this track, human h38 does. Screenshot below, but I would not use this representation (bar chart) for what you are doing, I don't think it would fit well with the sequence track. I suggest using a gray scale. light->dark heatmap. A color heatmap will clash with the nucleotide colors.
EDIT: something like this for scale
I've spent a few minutes looking at the relevant classes, hopefully I can say you some time. In general I am thinking the 3-frame translation is a good model, as you are doing something similar.
First consideration is where to compute gc%, and where to cache it. We don't want to compute on every draw. I think we can compute in the class "SeqeunceTrack.SeqCache". This is where AA translations are done. The extent of the cache is expanded +/- a screen width, the expansion happens around lines 234-250 of SequenceTrack. So you can assume there is some buffer so recalculations are not done on every repaint. If more optimization of this is needed it can be done later.
So you might start by computing GC% in the SeqCache class, using refereshAminoAcids as a model. The gc% would be stored in SeqCache, perhaps as an object with start position, window size, and array of floats for gc%.
Having computed GC, you can add the draw method in the SequenceRenderer class, again using the 3-frame translation as a model.
The SequenceTrack "getHeight" method will need modified to account for the extra height of the gc% band. I think that's the only change needed for that track.
For the initial PR you can just assume this track is on all the time and not worry about the menu to enable / disable it. That can come later. If you want to add this look at "getPopupMenu" in SequenceTrack. Also initially a hard-coded window size could be used, we may or may not want to add an option to set this.
RE placement, I think the heatmap band would look best just above the sequence, and of the same height.
One more thing, normally the sequence track doesn't display (or load data) until zoomed in quite far, ~ 1 base per pixel. For GC content you might want a much larger visibility window. This is currently set as a preference, obtained as follow PreferencesManager.getPreferences().getAsInt(MAX_SEQUENCE_RESOLUTION)
. This is called from 3 places in SequenceTrack, search for usages of the contstant. We'll need to do something different if GC% is enabled, but initially you can just replace those calls with hard-coded values. A large value here will impact performance, I don't know what you had in mind but I wouldn't go larger than 1 MB.
Thanks @jrobinso i'll give this a go, and update here with any questions!
Hey @jrobinso just an update- i'm completing my thesis at the moment, so i am very unlikely to give this a go in the next 3 months. Still happy to give it a go afterwards, but if you'd like to do it yourself beforehand, np!
Thanks for the great tool!
Would it be possible/easy enough to add a track with the loaded genome's GC content? Would be useful to check for e.g. a correlation with read depth. Could not find mention of it in the manual.
(P.S this is available in artemis)