openworm / owmeta

Unified, simple data access python library for data & facts about C. elegans anatomy
MIT License
153 stars 50 forks source link

Incorporate data from gene transcription atlas #324

Open slarson opened 7 years ago

slarson commented 7 years ago

http://cole-trapnell-lab.github.io/projects/worm-cell-atlas/

There are many ways this could help. This should improve accuracy of several of the data sources, including neuropeptides, neuro receptors, and ion channels.

Direct link to the getting started page: http://atlas.gs.washington.edu/worm-rna/docs/

Tasks

Data types

mwatts15 commented 7 years ago

Downloaded the raw data.

The version of monocle referenced in the link is old -- I'm not sure how that'll affect usage with later versions, but I've downloaded the latest using instructions here: http://cole-trapnell-lab.github.io/monocle-release/docs/#installing-monocle

I made a cursory read over the docs. As a first cut, could query this dataset for channel or neuron data. Would likely have a sub-class generated for the Neuron and Channel to indicate the data's available.

In doing this work, Contexts should be kept in mind since this data should sit it its own context.

mwatts15 commented 7 years ago

Couldn't install the latest 'monocle' due to a failure to build VGAM.

mwatts15 commented 6 years ago

I installed everything in an AWS EC2 instance and had a look at some of the raw data, although you can get most of this from their 'vignette' as well.

The types of neurons included in the data set

> neuron.types
 [1] "AFD"              "ASEL"             "ASER"             "ASG"             
 [5] "ASI/ASJ"          "ASK"              "AWA"              "AWB/AWC"         
 [9] "BAG"              "CAN"              "Cholinergic (11)" "Cholinergic (15)"
[13] "Cholinergic (23)" "Cholinergic (24)" "Cholinergic (26)" "Cholinergic (29)"
[17] "Cholinergic (3)"  "Cholinergic (35)" "Cholinergic (36)" "Cluster 10"      
[21] "Cluster 13"       "Cluster 16"       "Cluster 17"       "Cluster 21"      
[25] "Cluster 25"       "Cluster 27"       "Cluster 40"       "Cluster 5"       
[29] "Dopaminergic"     "DVA"              "flp-1(+)"         "GABAergic"       
[33] "Pharyngeal (33)"  "Pharyngeal (37)"  "PVC/PVD"          "RIA"             
[37] "RIC"              "SDQ/ALN/PLN"      "Touch receptor"   "URX/AQR/PQR"  

It's needed to map most of these to the standard names we typically use, like RIAL. I'm not sure yet what the indexes mean in the "Cholinergic (...)".

mwatts15 commented 6 years ago

This may be useful in deciding how to implement the translator: https://stackoverflow.com/questions/5630441/how-do-rpy2-pyrserve-and-pyper-compare

It may also be an option to use R in batch mode to just run a script to produce the data (or maybe littler)