pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

Jeffares hermes transposon insertion data, nucleosome data and HMM model data #767

Closed Antonialock closed 4 years ago

Antonialock commented 6 years ago

We have some data from a manuscript that we’d like to have displayed on your beautiful new genome browser.

It is Hermes transposon insertion data, nucleosome data and HMM model data data (one state for each position in the genome, derived from the insertion data). Also conservation measures from a new alignment of new Schizosaccharomyces species genome assemblies.

So four tracks in all. It is displayed here at the moment: http://bahlerweb.cs.ucl.ac.uk/bioda/ http://bahlerweb.cs.ucl.ac.uk/bioda/ (best viewed with Forefox).

I have bigWig files at the moment, but would be happy for reformat if need be.

Antonialock commented 6 years ago

will assign to @kimrutherford Kim once track labelling sorted.

Antonialock commented 6 years ago

From Dan: "Also, Ive realised that I would like to add another track (Hermes HMM-defined elements), which is a standard chr,start,end,name,score type so could be in bed or bigBed format."

Note to self: could call this type sof data transposon mutagenesis

Track data description upload format - Jeffares 2018(1).xlsx

Antonialock commented 6 years ago

@kimrutherford do you have a preference bigbed or bed? (I'm guessing bed since we have other datasets in that format?)

Antonialock commented 6 years ago

Dan didn't you also mention "and nucleosome density (From Maria)"?

djeffares commented 6 years ago

Yes I have the nucleosome density as bigbed. /Dan

Antonialock commented 6 years ago

ah I see, cheers. we can give it a shot but might be a bit trickier than the formats we have already tried our hands at - we are on a learning curve :-)

kimrutherford commented 6 years ago

@kimrutherford do you have a preference bigbed or bed? (I'm guessing bed since we have other datasets in that format?)

JBrowse supports BED and bigBed formats so I think either would be fine.

Antonialock commented 6 years ago

I can't see this data here yet: ftp://ftp.pombase.org/external_datasets/

does anything else need doing?

Would you mind if I have a look at the manuscript Dan?

kimrutherford commented 6 years ago

I can't see this data here yet: ftp://ftp.pombase.org/external_datasets/ does anything else need doing?

Sorry, I've lost track of where we're at. What are the file names?

Antonialock commented 6 years ago

Dan said he had bigwig files he could send and also pointed us to here: http://bahlerweb.cs.ucl.ac.uk/bioda/ (I never managed to view the data on the Bahler page in any of my browsers)

You said you had a quick look and that it looked straightforward because " had a quick look and I think this would be easy to add. We can handle bigWig files and the files mostly use the chromosome IDs we need (I, II, III, etc)."

and you said you had a quick look and it looked easy :p

djeffares commented 6 years ago

I’m happy to send the bigwig files, or links to them.

Cheers, Dan

Antonialock commented 6 years ago

Either would be great Dan! Would you mind if I have a look at the manuscript?

Just one request: could you make sure that wherever a chromosome is specified, that this is done in the format "I" "II" or "III"?

So looks like 5 files in total

Hermes transposon insertions Hemes HMM state Hermes HMM-defined elements Nucleosome positioning in WT Conservation (phyloP)

Thanks!

ValWood commented 6 years ago

Just one request: could you make sure that wherever a chromosome is specified, that this is done in the format "I" "II" or "III"?

We should make another announcement in a month or so specifying all of the things that providers should do to make this hosting easier. I'll start a list on the website tracker.

Antonialock commented 6 years ago

@djeffares - Kim just reminded me there's more to the genome than just the chromosomes. chromosome IDs: I, I, III, mating_type_region, mitochondrial and chr_II_telomeric_gap

kimrutherford commented 6 years ago

I’m happy to send the bigwig files, or links to them.

Hi Dan. Links would be great. Either bigwig or wig format is OK.

djeffares commented 5 years ago

Hello @Antonialock and @kimrutherford,

I'd like to reopen this ticket, as the paper is now accepted and on early access at MBE: https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msz113/5488193

I now have 5 tracks to provide:

PhyloP conservation estimates from the four Schizosacharomyes species. File: Conservation-SchizPom_phyloP.bigWig

Nucleosome density data from log phase cells File: Nucleosome-density-wtNucWave-reps-median.depth_wl_trimmed_PE2.bigWig

Hermes transposon insertion counts from log phase cells File: ermes-all-log.counts-incl-VT.2016-10-03.txt.bigWig

HMM states from log phase insertions, reflecting the importance of each position in the genome File: all-log-data.hmmstate.model5A.bigWig

HMM-defined elements (HDEs), representing functional units File: hermes-log.av.mapping.ratio0.9.states.stateblocks.100ntlength.bed

I have made a tar file available on google drive that contains all these files: https://drive.google.com/file/d/1bWqy8luMLY71thv-51JSldpUNzUHFnlu/view?usp=sharing

All these bigWig files should function, as they do on the Bioldalliance browser that Danny Bitton set up: http://bahlerweb.cs.ucl.ac.uk/bioda/

If you navigate to position I:198,640..290,753 this will display (on Firefox, and perhaps Chrome).

best wishes Dan

Antonialock commented 5 years ago

Wonderful news, congratulations!

@kimrutherford do the files need tweaking?

Dan could you also provide track metadata? it is described in here:

Track data description upload format.xlsx

columns A-Q (some obviously not applicable, I can add the PMID once it has one...)

kimrutherford commented 5 years ago

Hi Dan. Thanks for the files. I've grabbed a copy.

@kimrutherford do the files need tweaking?

I need to change some of the chromosome IDs. I'll do that sometime next week.

djeffares commented 5 years ago

Hi @kimrutherford

Any chance you'd found time to look at this?

I'm asking because the paper is now out, and the tweet is getting some likes, so people may want to browse the data. https://twitter.com/danieljeffares/status/1141006963112890369

Happy to help reformat files, if need be.

cheers Dan

kimrutherford commented 5 years ago

Hi Dan.

Sorry, I haven't got to that. I'll have a look on Monday. I'll let you know if I have any questions.

kimrutherford commented 5 years ago

Any chance you'd found time to look at this?

Hi Dan.

I've had a look at the files. I needed to tweak the chromosome IDs in some cases (eg. change "MT" to "mitochondrial") to match what JBrowse expects but they are ready to go now.

Did you see Antonia's comment about the metadata?: https://github.com/pombase/website/issues/767#issuecomment-497456515 I think an extra column has been inserted in the examples in the track description spreadsheet Antonia's attached to that comment. Here's a fixed version: Track.data.description.upload.format.xlsx

Cheers!

djeffares commented 5 years ago

Hi @kimrutherford

Is this metadata file OK?

Jeffares-2019-track.data.description.upload.format xlsx.xlsx

kimrutherford commented 5 years ago

Thanks Dan. That looks good. I'll try to get the tracks into the browser today.

This label might be a bit too long: Conservation level estimated using phyloP method, from Cactus alignment of S. pombe, S. japonicus, S. octosporus and S. cryophilus genomes (Grech 2019)

Can we shorten it?

djeffares commented 5 years ago

Thanks Kim,

How about:

Conservation, estimated using phyloP from alignment of four Schizosaccharomyces genomes (Grech 2019)

Best wishes, Dan

kimrutherford commented 5 years ago

How about: Conservation, estimated using phyloP from alignment of four Schizosaccharomyces genomes (Grech 2019)

That's great.

I've had to change the commas to semicolons because JBrowse doesn't support commas in track labels. I'd forgotten that. I'm happy to change any labels if they look too naff with the semicolons.

The new tracks are visible now: new tracks

Just to check, the "Nucleosome density from exponentially growing wild-type" row has a different PubMed ID from the other 4. Was that on purpose?

ValWood commented 5 years ago

Conservation estimated from (the) alignment of four Schizosaccharomyces genomes using phyloP

and doesn't require a comma or colon

ValWood commented 5 years ago

We can announce this (i'm not sure if Antonia is announcing browser hosting in batches though)

We could also add a "research spotlight" to the front page if you have a suitable image Dan,

kimrutherford commented 5 years ago

Here's what it looks like in the region around cdc2, at Spotlight image resolution:

grech-spotlight-image-1

djeffares commented 5 years ago

Hi Kim and Val,

This image looks great to me. Clearly fewer transposon insertions in the gene (and even fewer in the antisense ncRNA), different HMM state sin the UTRs, and higher conservation (phyloP) in the coding axons.

I’d be happy to us Ethiopian as a research spotlight. Thanks!

Best wishes, Dan

ValWood commented 5 years ago

Is

Research spotlight: Grech et al., 2019 The fitness Landscape of the fission yeast genome. Published in Mol Biol Evol. PMID: 31077324

OK?

Ethiopian? @djeffares

kimrutherford commented 5 years ago

Research spotlight: Grech et al., 2019

I've added that to the configuration. It's won't be visible until tomorrow morning so there's still time to tweak things.

I haven't included the usual "Publication record in PomBase ..." link because there isn't a publication page for PMID:31077324. Is that expected?

There's room for a longer text if you like. Now I've added the config I realise that a link to JBrowse makes sense. I'll add that.

It will look like this when it appears on the website (plus a JBrowse link once I add that):

grech-spotlight-image-2

kimrutherford commented 5 years ago

Now I've added the config I realise that a link to JBrowse makes sense. I'll add that.

That's done. Let me know if you'd like any wording changes.

grech-spotlight-image-3

djeffares commented 5 years ago

Looks great, thanks Kim and Val.

On Wed, 3 Jul 2019, 05:56 Kim Rutherford, notifications@github.com wrote:

Research spotlight: Grech et al., 2019

I've added that to the configuration. It's won't be visible until tomorrow morning so there's still time to tweak things.

I haven't included the usual "Publication record in PomBase ..." link because there isn't a publication page for PMID:31077324. Is that expected?

There's room for a longer text if you like. Now I've added the config I realise that a link to JBrowse makes sense. I'll add that.

It will look like this when it appears on the website (plus a JBrowse link once I add that):

[image: grech-spotlight-image-2] https://user-images.githubusercontent.com/90474/60577929-6be04c00-9dd4-11e9-91ff-d0f000e12fb7.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pombase/website/issues/767?email_source=notifications&email_token=AD2HHSMZJIP46WRIWFQ2YWTP5RSURA5CNFSM4FA6VHOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZDYXCI#issuecomment-508005257, or mute the thread https://github.com/notifications/unsubscribe-auth/AD2HHSK5J3KCIKJ64QRUYHLP5RSURANCNFSM4FA6VHOA .

djeffares commented 5 years ago

Looking at the browser now, I see that the HMM track description is unhelpful.

Could this be altered to the text below?

HMM fitness model (more important regions have lower scores).

Thanks again for displaying and highlighting this.

PS: 'Ethiopian' was a autocorrect typo, amongst many in that message. 🤔

kimrutherford commented 5 years ago

No problem. There are two descriptions that mention HMM. Which is the one to change?:

Antonialock commented 5 years ago

Excellent!

I have a few questions/comments on the track descriptions.

To clarify what "they are about":

There are 4 tracks:

  1. Hermes transposon insertion sites from multiple insertion libraries In this track each line represent an insertion site (and height reflects how many times it was observed in the cells in the libraries)
  2. HMM state generated from transposon insertion data Here the transposon insertion sites are smoothened to states S1-S5. The height of the scale bar corresponds to the state (0=S1, 2=S2, 3=S3...)
  3. Conservation estimated from alignment of four Schizosaccharomyces genomes using phyloP This track shows the conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus I'm really not sure how to interpret the track itself? Could you explain?
  4. HMM-derived elements (HDEs); genome windows with runs of one HMM state This track shows continuous runs of states S1-S3

this is what they look like in the browser:

Screenshot 2019-07-03 at 17 45 02

Secondly, I'm not keen on having 3 new "data types" (transposon insertions, HMM state blocks, transposon HMM state) - can we fit these in under "transposon insertions" and tweak the track descriptions?

New track descriptions: Track 1: Transposon insertion sites (Grech 2019) Track 2: Transposon insertion sites smoothened to states S1-S5 (Grech 2019) Track 3: (needs elaborating so that it is obvious what it is showing) Conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus (Grech 2019) track 4: HDE units, continuous runs of states S1-S3 (Grech 2019)

I suggest specification of "assay type" (not great terminology for modelling.. perhaps "method" would be better?) as follows:

Track 1: Hermes Track 2: HMM Track 3: PhyloP Track 4: HMM

I also suggest to move the "sample ID" into "study ID" column ? (it looks like a study ID not a sample ID?)

Is it really applicable to both tracks 1 and 4 - should it only be added to track 1?

cheers! @djeffares

Antonialock commented 5 years ago

also @kimrutherford can we show the labels for the "HMM-derived elements (HDEs); genome windows with runs of one HMM state" track by default?

ValWood commented 5 years ago

Secondly, I'm not keen on having 3 new "data types" (transposon insertions, HMM state blocks, transposon HMM state) - can we fit these in under "transposon insertions" and tweak the track descriptions?

I agree- we need the "types" to be broader groupings, and to limit the number. This specificity should be in the description.

kimrutherford commented 5 years ago

also @kimrutherford can we show the labels for the "HMM-derived elements (HDEs); genome windows with runs of one HMM state" track by default?

The labels are on but they only show when you zoom in:

grech-tracks-1

Antonialock commented 5 years ago

Ah.. I did see those but because of the coordinates I didn’t see (what I thought was the) interesting bit - whether the feature is S1, S2, S3 - didn't see the forest for the trees - the coordinates are available when you click on a feature (see screenshot below), perhaps it is better to keep the label simple?

Screenshot 2019-07-03 at 22 26 45
kimrutherford commented 5 years ago

It will look like this when it appears on the website (plus a JBrowse link once I add that):

It's on the main site now and is one of the Spotlights that will be shown on the front page: https://www.pombase.org/archive/spotlight

djeffares commented 5 years ago

Hi Kim,

Please adjust the first one:

HMM state generated from transposon insertion data (Grech 2019)

Best wishes, Dan

On 3 Jul 2019, at 07:43, Kim Rutherford notifications@github.com wrote:

No problem. There are two descriptions that mention HMM. Which is the one to change?:

HMM state generated from transposon insertion data (Grech 2019) HMM-derived elements (HDEs); genome windows with runs of one HMM state (Grech 2019) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pombase/website/issues/767?email_source=notifications&email_token=AD2HHSPXDGQZHGEYDOQ3FLDP5R7FXA5CNFSM4FA6VHOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZEBSGY#issuecomment-508041499, or mute the thread https://github.com/notifications/unsubscribe-auth/AD2HHSOF5LPZTI3G6WTLPNLP5R7FXANCNFSM4FA6VHOA.

kimrutherford commented 5 years ago

Could this be altered to the text below? HMM fitness model (more important regions have lower scores).

Hi Dan.

I've made that change. It will on pombase.org on Saturday morning.

Antonia has a different suggestion for that track and some of the others: https://github.com/pombase/website/issues/767#issuecomment-508185063

Let us know what you think.

Antonialock commented 5 years ago

Hi @djeffares Did you see my questions above?

djeffares commented 5 years ago

Excellent!

I have a few questions/comments on the track descriptions.

To clarify what "they are about":

There are 4 tracks:

  1. Hermes transposon insertion sites from multiple insertion libraries In this track each line represent an insertion site (and height reflects how many times it was observed in the cells in the libraries)
  2. HMM state generated from transposon insertion data Here the transposon insertion sites are smoothened to states S1-S5. The height of the scale bar corresponds to the state (0=S1, 2=S2, 3=S3...)
  3. Conservation estimated from alignment of four Schizosaccharomyces genomes using phyloP This track shows the conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus I'm really not sure how to interpret the track itself? Could you explain?
  4. HMM-derived elements (HDEs); genome windows with runs of one HMM state This track shows continuous runs of states S1-S3

this is what they look like in the browser:

Screenshot 2019-07-03 at 17 45 02

Secondly, I'm not keen on having 3 new "data types" (transposon insertions, HMM state blocks, transposon HMM state) - can we fit these in under "transposon insertions" and tweak the track descriptions?

New track descriptions: Track 1: Transposon insertion sites (Grech 2019) Track 2: Transposon insertion sites smoothened to states S1-S5 (Grech 2019) Track 3: (needs elaborating so that it is obvious what it is showing) Conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus (Grech 2019) track 4: HDE units, continuous runs of states S1-S3 (Grech 2019)

I suggest specification of "assay type" (not great terminology for modelling.. perhaps "method" would be better?) as follows:

Track 1: Hermes Track 2: HMM Track 3: PhyloP Track 4: HMM

I also suggest to move the "sample ID" into "study ID" column ? (it looks like a study ID not a sample ID?)

Is it really applicable to both tracks 1 and 4 - should it only be added to track 1?

cheers! @djeffares

HI @Antonialock,

Sorry, I missed these questions. Answers below:

To clarify what "they are about":

  1. Hermes transposon insertion sites from multiple insertion libraries In this track each line represent an insertion site (and height reflects how many times it was observed in the cells in the libraries) DJ: Yes, correct.

  2. HMM state generated from transposon insertion data Here the transposon insertion sites are smoothened to states S1-S5. The height of the scale bar corresponds to the state (0=S1, 2=S2, 3=S3...) DJ: Yes, correct. But why is is that 0=S1? Surely S1 (state1) should have height=1 ?

  3. Conservation estimated from alignment of four Schizosaccharomyces genomes using phyloP This track shows the conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus I'm really not sure how to interpret the track itself? Could you explain? DJ: No, this track is not generated from the Hermes transposon insertions. It is the conservation of each site over the phylogeny os the four Schizosaccharomyces species (S. pombe S. octosporus, S. japonicus, and S. cryophilus). Higher values mean more conservation (the scale is a negative logged P-value). The values were generated from an genome alignment, using the phyloP algorithm.

  4. HMM-derived elements (HDEs); genome windows with runs of one HMM state This track shows continuous runs of states S1-S3 DJ: Yes.

Secondly, I'm not keen on having 3 new "data types" (transposon insertions, HMM state blocks, transposon HMM state) - can we fit these in under "transposon insertions" and tweak the track descriptions? DJ: Yes, this is fine.

New track descriptions: Track 1: Transposon insertion sites (Grech 2019) Track 2: Transposon insertion sites smoothened to states S1-S5 (Grech 2019) Track 3: (needs elaborating so that it is obvious what it is showing) Conservation of S. pombe fitness consequence states S1-S5 in S. octosporus, S. japonicus, and S. cryophilus (Grech 2019) track 4: HDE units, continuous runs of states S1-S3 (Grech 2019) DJ: Yes, but I think "smoothed" is simpler & in more common use.

I suggest specification of "assay type" (not great terminology for modelling.. perhaps "method" would be better?) as follows:

Track 1: Hermes Track 2: HMM Track 3: PhyloP Track 4: HMM

DJ: What about Transposon rather than Hermes, which is more generic.

I also suggest to move the "sample ID" into "study ID" column ? (it looks like a study ID not a sample ID?) Yes, fine.

Is it really applicable to both tracks 1 and 4 - should it only be added to track 1? DJ: Yes, fine.

cheers Dan

Antonialock commented 5 years ago

Thanks! I updated the descriptions:

I thought it was smoothened not smoothed - I blame the Drosophila researchers! :-)

"Hermes" is perfect in the assay desciption - in this field we want a detailed method type. In comparison "data type" is a higher level grouping term for different datasets (e.g. transcripts, chromatin binding sites...).

Let me know if you want anything tweaked, otherwise I'll announce tomorrow

ValWood commented 4 years ago

The fitness landscape data is hosted and announced so I am clodsing this ticket..