European ash transcriptome set

statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues

GNU General Public License v3.0

2 stars 1 forks source link

The genome paper was accompanied by a re-analysis of 240 RNASeq samples for the purpose of associative transcriptomics, looking at ash dieback response. this will take a bit of work to sort out RNASeq samples. The paper also reports on a few samples from different tissues, which should probably be separate.

Original transcriptome paper, Harper et al 2016, where reads are mapped to a de novo assembly: https://www.nature.com/articles/srep19335

Genome paper, Sollars et al 207, where reads are remapped to the gene models: https://www.nature.com/articles/nature20786

Raw reads can be found via this EMBL: https://www.ebi.ac.uk/ena/data/view/PRJEB4958

Note, the genome paper does have RPKM files available as supplementary material, meaning we do not need to reprocess all this data. Also the original paper has an excel file describing the biosamples, particularly their disease score, which is the most important metric to include.

I can't think of any reason we want the 2016 data, just the 2017 data, right?

Expresson data

Supplemental download from the journal

It is already in matrix format. Samples are named Ash1... Ash 200.

Format:

Biosamples

The eMBL project meg links above has links to the Biosamples in the XML.

<XREF_LINK>
                    <DB>ENA-SAMPLE</DB>
                    <ID>ERS370607,ERS1138331,ERS1205907-ERS1205943,ERS1887564-ERS1887583</ID>
               </XREF_LINK>

it is not clear to me how, or even if, these accessions would link to the columns in the expression data (ash1, ash 2, etc)

That said, the disease scores etc are in supplemental dataset 1 of the 2016 paper, which is here.

Sequence ID - Ash 1, Ash 2, etc
Group ? should be in paper
Tree ID ? should be in paper
GIS - location
Number of ramets sampled - collect sure why not
Corrected Damage Score 2014 - the damage score, most important
Q1 - ?
Q2 - ?

I think that maybe the bulk loader would be the way to go for these biosamples.

statonlab / hardwoods_site