statonlab / hardwoods_site

Hardwoods Genomics bugs, data loading, and general issues
GNU General Public License v3.0
2 stars 1 forks source link

Populus trichocarpa expression data #49

Open mestato opened 6 years ago

mestato commented 6 years ago

Populus trichocarpa has expression data available through JGI Phytozome. I can't find it in the browser or gene pages, but the expression values are available in the phytomine interface (https://phytozome.jgi.doe.gov/phytomine/report.do?id=50127750&trail=|50127750)

It may be possible to pull the normalized expression values in bulk through this interface. For metadata and/or if remapping/quantification is needed, the raw data is in NCBI. It seems that each sample is its own bioproject: PRJNA372410-PRJNA372412,PRJNA402533-PRJNA402568

bradfordcondon commented 6 years ago

https://phytozome.jgi.doe.gov/phytomine/template.do?name=Gene_Expression

Seelcting p. trichocarpa, this gives me 992,040 rows.

organized by: Gene, abundance, experiment name, exerpiment group.

I think that experiment name will map to biomaterials, experiment group will map to analysis for the expression module.

Potri.001G000100    0.0 BESC423.ZL 7 female early   GeneAtlas Tissue Sample
Potri.001G000100    0.0 BESC443.ZG 43 female receptive  GeneAtlas Tissue Sample
Potri.001G000100    0.0 BESC842.ZI 22 female late   GeneAtlas Tissue Sample

I can rebuild this data quite easily into matrix format for loading.

Maybe @jwest60 would be interested? This could be a good excuse to practice python.

Data is available at /staton/projects/populus_trichocarpa_expression

bradfordcondon commented 6 years ago
bradfordcondon commented 6 years ago

Getting the biomaterials info:

The expression list has this entry for the data in the table

screen shot 2018-02-01 at 1 18 49 pm

The individual pages for the tissues are not helpful ie Experiment Name: | BESC423.ZL 7 female early | Experiment Group: | GeneAtlas Tissue Sample

Looking for these tissue names, I can find them ref'd in some pubs or in some static content ie:

https://static-content.springer.com/esm/art%3A10.1186%2Fs12864-016-3026-2/MediaObjects/12864_2016_3026_MOESM17_ESM.xlsx

Note also that some tissue names are much less informative: ie stem-urea.

There are few enough samples that we could manually create a biomaterial matching each, and add whatever properties we can infer from the name (tissue type, treatment...)