pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

priority browser datasets to host #1441

Open ValWood opened 4 years ago

ValWood commented 4 years ago

Datasets for gene feature boundaries, or ncRNA

Data type/ details method PMID First/last author data source status Comment
intron/spliceosome profiling RNA-seq 29727662 Chen/Moore ? BDTH @Antonialock "need to ask"
transcripts/ curated from multiple merged datasets curated 20118936 Lantermann/ Korber GSE16040 BDTH HIGH PRIORITY dataset for gene structures. https://github.com/pombase/website/issues/62 Data in spreadsheet attached to Jira PB-2623". Kim may have converted it to GFF already? Strain: h90 WT
transcriptome/ strand-specific transcriptome HybMap 18641648 Dutrow/Cairns GSE11619 BDTH used for curated Lantermann
transcripts/ new ncRNA/revised UTR -- 26883383 Eser/Gagneur BDH ANNOUNCED https://github.com/pombase/website/issues/50 Issues: 1. No IDs for new ncRNA 2. Redundancy? FUTURE CONSIDER SEPARATING FWD AND REVERSE STRAND
transcripts /RNA-seq - 32152323 Tsuyuzaki/Sato BDH announced 2020-09-17 [pombase #8073] mailed Masa 2/9/2020;
TSS - 25747261 Li/Shao - BDH - ANNOUNCED #736

BDTH = browser datasets to host BDH = Browser dataset hosted

Other datasets

Data type /details method PMID/DATA First/last author status Comment
Chromatin modification/ H4 Ac & H3K36 Me CHIPseq 22121899 Sinah/Ekwall ?
Chromatin modification/histone modifications H3K4me3 +? CHIPseq 24100010 DeGennaro/Winston ? possibly RNA seq too? This dataset is used to evaluate Li DeepCAGE
translation/ meiosis and sporulation ribosome profiling 24929437 Duncan/Mata ? requested by2 in user survey Q cyclohexamide
translation/ meiosis proteomics 32594847 Huraiova/Gregan ?
DSBs ? 23395004 Fowler/Smith ? #1198 Figures 3 and 6 [pombase #6827]; tracks present as of 2021-07-16
DSBs? Rec12 protein binding ? 25024163/GSE49977 Fowler/Smith ? #1198 rec12 binding is DSB marker? [pombase #6827]
DNA binding motifs pattern matching n/a PomBase - #802
Transposon insertion - PMID: 32101745 Lee/Levine ANNOUNCED
mah11 commented 4 years ago

@mah11 in progress?

did you look in the browser?

ValWood commented 2 years ago

Add to list

Long read transcriptome (Jeff Boake lab) would complement other transcriptome data (https://pubmed.ncbi.nlm.nih.gov/27856494/)

ValWood commented 2 years ago

Also, https://pubmed.ncbi.nlm.nih.gov/35618415/ suggested by @juanmatacambridge

ValWood commented 1 year ago

intron data (mainly meiosis specific) Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast https://genome.cshlp.org/content/25/6/884.short Table S12

ValWood commented 3 weeks ago

I'm reviewing the action items from 2022 SAB. This is the only outstanding task. Maybe this can be prioritised (the 3 datasets requested after )CT 2022, not all of them). @kimrutherford could you show @PCarme how to do this at some point?

kimrutherford commented 2 weeks ago

Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast https://genome.cshlp.org/content/25/6/884.short Table S12

I think it will be straightforward to turn that table into a browser track. It has the critical things: chromosome ID, strand, start and end positions and a feature ID.

kimrutherford commented 1 day ago

I think it will be straightforward to turn that table into a browser track.

The processed data file in BED format is here: https://www.pombase.org/external_datasets/Bitton_2015_PMID_25883323_introns/PMID_25883323_Supplemental_Tables-S12.sorted.bed.gz

kimrutherford commented 1 day ago

I've had a look at the other two datasets.

Long read transcriptome (Jeff Boake lab) would complement other transcriptome data (https://pubmed.ncbi.nlm.nih.gov/27856494/)

For this paper I think it makes sense to show the assembled transcripts as a track, but I can only find the raw reads (in GEO). We can show raw reads but it's probably not what users would want. Also we'd need to align the reads to the genome to make a BAM file.

Also, https://pubmed.ncbi.nlm.nih.gov/35618415/ suggested by @juanmatacambridge

This one looks easier. They have various GTF files on figshare and a BAM file. I approve!

One of the files is "Spom_novel_transcripts.gtf" which sounds immediately useful: https://figshare.com/articles/dataset/Native_RNA_sequencing_in_fission_yeast_reveals_frequent_alternative_splicing_isoforms/19368146?file=34398125

We'd need to read the paper to decide which of the other files would make good browser tracks.

kimrutherford commented 15 hours ago

Widespread exon skipping triggers degradation by nuclear RNA surveillance in fission yeast https://genome.cshlp.org/content/25/6/884.short Table S12

That track is now on the site:

We might need to work on how it looks. We can configure colours, labels and sizes of features:

image

ValWood commented 6 hours ago

One small thing. I think we should have a new data type "intron-related". It would group this dataset and the branch point data.

The label "genome sequence and features" was intended to be reserved for the actual genome sequence and annotated features (maybe we could improve that label?

Screenshot 2024-11-27 at 08 36 21