pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

browser track for TSS atlas identified by CAGE (Olaf) #368

Closed ValWood closed 4 years ago

ValWood commented 7 years ago

Once we get JBrowse up and running we should make this one of the first datasets to host as we will need it to refine gene structures as proposed: https://github.com/pombase/curation/issues/633

ValWood commented 7 years ago

@MalteThodberg Need to keep Malte and Olaf posted

kimrutherford commented 6 years ago

Olaf says some of their data is in BigWig format so maybe we could try this soon?

ValWood commented 6 years ago

OK, @MalteThodberg We can use this as a test case for a new dataset as we set up the procedure for submission.

our beta JBrowse instance is here: https://www.pombase.org/jbrowse/index.html?loc=II%3A485563..499792&tracks=DNA&highlight=

val

MalteThodberg commented 6 years ago

Dear all,

Our preprint is out (The paper itself is still in revision): https://www.biorxiv.org/content/early/2018/03/13/281642

We have already deposited all the data on GEO Series GSE110976 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110976).

I'll be happy to assist in uploading the data!

kimrutherford commented 6 years ago

I've put the spreadsheet from Malte in Dropbox: pombase/PomBase_website/thodberg_cage_dataset/ThodbergWithEdits2.xlsx

Antonialock commented 6 years ago

should I copy it into the master doc?  @kimrutherford

Antonialock commented 6 years ago

@MalteThodberg do we need to wait with displaying the data until paper is out or canwe just go live (seeing the data is in GEO in any case?)

kimrutherford commented 6 years ago

should I copy it into the master doc?

Do you mean into the JBrowse configuration table? Yes, please. It will be a while before I can get the datafiles set up though.

MalteThodberg commented 6 years ago

@Antonialock Yes, it's perfectly fine to make it public, as it is already public on GEO and bioRxiv

Antonialock commented 6 years ago

ok, makes sense, just thought I'd make sure :-) @MalteThodberg

kimrutherford commented 6 years ago

I've put the BED file and the bigwig files in the right place on the web server. I needed to change some of the the chromosome IDs in the bigwig files to match what JBrowse expects:

The URLs are:

https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_EMM_exp_minus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_EMM_exp_plus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_EMM_nitro_minus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_EMM_nitro_plus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_TSSs.bed https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_exp_minus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_exp_plus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_H2O2_minus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_H2O2_plus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_heat_minus.bw https://www.pombase.org/external_datasets/Thodberg_GSE110976/GSE110976_YES_heat_plus.bw

Antonialock commented 6 years ago

Thodberg data added to metadata file.

@MalteThodberg would it make sense to relabel the track description of the bed file: 'Most frequently observed transcription start sites across 5 growth conditions' to clarify that it shows the commonest TSSs?

ValWood commented 6 years ago

That's very long!

Maybe just "consensus transcription start sites"

Antonialock commented 6 years ago

yeah agree 'Most frequently observed' is long, but I thought it isn't really a "consensus"? (or is it? semantics! "Commonest" sounds ridiculous)

ValWood commented 6 years ago

To me it is a consensus if it's the commonest TSS across all conditions? ....it isn't necessarily a consensus sequence, but it's a consensus position ;)

ValWood commented 6 years ago

@MalteThodberg ?

kimrutherford commented 6 years ago

Thodberg data added to metadata file.

Thanks! I needed to add "forward strand" and "reverse strand" to the track labels to make them unique, otherwise only one of the strand was visible.

The new tracks are live in the main site. You might need to shift-reload if you've used JBrowse within the last hour (the track info is cached in the browser for an hour).

If you make any fixes to pombase_jbrowse_track_metadata.csv let me know and I'll re-release the site. It's quite quick to re-release if only the JBrowse data is the only change - only 2-3 minutes.

ValWood commented 6 years ago

Oh it looks very nice.

I'm not clear about how the scale bar works. Does the scale bar scale according to everything in a view? or just the proximal element. It's really neat...

ValWood commented 6 years ago

@MalteThodberg We would like to feature on the front page "Research Spotlight" https://www.pombase.org/. Could you provide an image of the correct aspect ratio? (it can be a composite)...just something eye-catching from the paper.

ValWood commented 6 years ago

We will make an announcment and the front page rotation coincide with publication.

kimrutherford commented 6 years ago

I'm not clear about how the scale bar works. Does the scale bar scale according to everything in a view?

Yep, it changes the scale based on what's currently in view.

ValWood commented 6 years ago

Something like

eg

but it's better not to have "small text" if possible

Antonialock commented 6 years ago

To me it is a consensus if it's the commonest TSS across all conditions? ....it isn't necessarily a consensus sequence, but it's a consensus position ;)

yeah fine, sounds good...stay tuned for track description updates :)

Antonialock commented 6 years ago

Thanks! I needed to add "forward strand" and "reverse strand" to the track labels to make them unique, otherwise only one of the strand was visible.

Ah shoot, I had started that task but then went for group meeting, lunch etc.

If you make any fixes to pombase_jbrowse_track_metadata.csv let me know and I'll re-release the site. It's quite quick to re-release if only the JBrowse data is the only change - only 2-3 minutes.

I have done quite a few changes to the descriptions so it would probably be good to reload!

MalteThodberg commented 6 years ago

Coming in late to the discussion:

kimrutherford commented 6 years ago

I can see the BED-track in the new browser, however I get "Error: jDataView length or (byteOffset+length) value is out of bounds" when trying to view BigWig tracks.

Thanks for checking. Could you let us know which web browser you're using? I tested in Firefox and Chrome before the announcement went out. But now it doesn't work for me in Chrome which is strange. I'll try to fix that today.

kimrutherford commented 6 years ago

When I reloaded the page in Chrome it worked fine. That makes it tricky to debug. Antonia notice a similar problem a few days ago that also went away after a reload. I'll do some Google searching to see if others have had this problem.

MalteThodberg commented 6 years ago

You are right: I only get the error message using Safari, using Chrome it renders with no errors.

How do you determine the color of tracks? Since CAGE is stranded, it would be nice to match the colors of the consensus TSSs to their corresponding coverage tracks (i.e. forward strand = red and reverse strand = blue).

Is it possible to get the new browser to render the Consensus TSS peaks in the thickStart/thickEnd columns of the BED-file?

ValWood commented 6 years ago

Hi @MalteThodberg I didn't reply about the image aspect ration. It is https://www.pombase.org/ images on the front page. Approx 1:1.5 (?) @kimrutherford can say exactly but it does not need to be precise we can trim.

kimrutherford commented 6 years ago

Sorry, I forgot to reply too. The aspect ratio of our current images is 1:1.62. An image close to that would be best but as Val says, we can trim or we can pad the images with a little bit of white space if needed.

Thanks!

MalteThodberg commented 6 years ago

Sorry for the delay, we have been busy with paper revisions! Here's a possible figure for a research spotlight:

PomBase_ResearchSpotlight.pdf

ValWood commented 6 years ago

Hi Malte,

That will be great.

I was also wondering, now I had a closer look at the data, I think Antonia was correct and the "combined" file isn't really a consensus. Its a combined view of all observed sites.

I think users would find it really useful to see the "consensus" sites, as features on a track. It is difficult to see these on the individual tracks because the expression levels can be so different. In the combined track you can not distinguish the real features of interest.

i.e the major peaks here:

peaks

MalteThodberg commented 6 years ago

That's a very pretty browser picture! :-)

I'm not entirely sure what you mean: The BED-file summarises the basepair-resolution BigWig files into discrete Tag Clusters, corresponding to TSSs.

You are absolutely right, that there's a huge difference in scale between the BigWig-tracks. Because the CAGE-tags are so accurate they tend to pile up extremely high. That means that some Tag Clusters superficially look very lowly expressed if they are in the same view as a very highly expressed Tag Cluster. The BED-file is made with and expression cutoff of at least one count-per-million, in at least three samples (out of 15), so they should all be expressed in biological meaningful amounts.

Normally one would force the same scale across all BigWig tracks for better comparison, for example like we do in the paper:

image

I don't know if that's a possibility in the new browser?

ValWood commented 6 years ago

That's a very pretty browser picture! :-)

It is!

I'm not entirely sure what you mean:

It might be useful to see the "consensus" rather than the spread (or in addition)

i.e. cons_crop

...because the CAGE is so sensitive, and many sites will be used hardly at all but are included in the cluster even if 99.99% are at a discrete site.

Note that here I mean for a single gene/loci, rather than normalizing levels between loci

You are absolutely right, that there's a huge difference in scale between the BigWig-tracks. Normally one would force the same scale across all BigWig tracks for better comparison, for example like we do in the paper: I don't know if that's a possibility in the new browser?

I think it is, but my understanding is that the files need to be pre-processed in some way? @Antonialock @kimrutherford is that correct?

MalteThodberg commented 6 years ago

An alternative to showing the full CAGE Tag Clusters (TCs) could be to only plot the TC peak, which is the single basepair with the overall highest usage within each Tag Cluster. This position is stored in the BED-file in the thick column. This is what we show in the paper (Bottom track in the plot from my previous post), as the thick line within TCs.

Another option is to "trim" the the tails of the TCs, by removing bases at the end of TCs with low overall usage. This makes the TCs in the browser appear smaller and more in line with the BigWig tracks. Adding such a track is possible, although it would deviate substantially from how we analyze/validate the CAGE data in the paper.

Something completely different: We are almost ready to resubmit the pre-print with referee comments. However, we are having trouble acquiring the previous CAGE data from the Li et al. (https://www.ncbi.nlm.nih.gov/pubmed/25747261) authors. I think I remember the old PomBase browser having a copy of the data - is it possible to find old browser datasets somewhere?

ValWood commented 6 years ago

Your first suggestion sounds good.

yup CAGE data from the Li et al https://github.com/pombase/website/issues/736 https://github.com/pombase/website/issues/735 on our radar.

MalteThodberg commented 6 years ago

Thank you!

I can see there's a couple of tracks: LiTSS_TS1.bed LiTSS_TS2.bed LiTSS_TS6.bed new/LiTSS.bed new/LiTSS_TS6.bed

Do you happen to know which file was used in the old browser?

ValWood commented 6 years ago

I'm not sure, any idea anyone?

Antonialock commented 6 years ago

Nope

mah11 commented 6 years ago

I think the old browser may have had two tracks, one for each file in the "new" subdirectory. I've put more details in #736, since that ticket is dedicated to that dataset.

ValWood commented 4 years ago

This was done ages ago. https://www.pombase.org/reference/PMID:30566651 closing. If anything is still to do please open a new ticket .