nemoarchive / analytics

Repository for the NeMO Analytics project.
MIT License
1 stars 0 forks source link

Demo datasets to load #23

Closed RLC-DCPPC closed 5 years ago

RLC-DCPPC commented 5 years ago

Kriegstein 10x

RLC-DCPPC commented 5 years ago

bed and wig file to apaala (alex) - epigenome track for epiviz

brianherb commented 5 years ago

Test scRNA dataset from Kriegstein 10X samples:

Location of data on grid: /local/projects-t3/NEMO/incoming/brain/biccc/kriegstein500k/transcriptome/scell/processed/counts/for_gEAR/GW18_PFC.h5ad

Metadata: GW18_PFC_metadata.xlsx

casalex commented 5 years ago

bed and wig file to apaala (alex) - epigenome track for epiviz

@apaala, remind me if these can be regular .bed files or if they should be bigBed?

apaala commented 5 years ago

@casalex I think they need to be bigbeds, right @jkanche ?

casalex commented 5 years ago

bigbeds can be found here: /local/projects-t3/idea/acasella/NeMO/data/roadmap_epi/narrowPeaks_bigbed

apaala commented 5 years ago

@casalex do you know what the modality of the data is? Is it methylation?

casalex commented 5 years ago

These particular ones are ChIP-seq I believe. If you need more detailed info info it can be found in the metadata.tsv file in the same directory--I would look at the "Assay" and the "Experiment target" fields in particular. Let me know if that works!

apaala commented 5 years ago

@jkanche Here is the path to big beds http://data.nemoarchive.org/other/grant/epigenome_roadmap/epigenome_roadmap/chipseq/bulk/processed/align/

casalex commented 5 years ago

@apaala I subsetted the metadata--updated file is /local/projects-t3/idea/acasella/NeMO/data/roadmap_epi/narrowPeaks_bigbed/metadata.tsv

brianherb commented 5 years ago

Hi Josh-

I have prepped 3 new Kriegstein 10X datasets for ingest (Anup mentioned there was a push for next Monday 3/18). I created a .h5ad file and .json metadata file for each

Directory: /local/projects-t3/NEMO/incoming/brain/biccc/kriegstein500k/transcriptome/scell/processed/counts/for_gEAR

Files: CS22_PFC.h5ad CS22_PFC_meta.json

GW19_PFC_all.h5ad GW19_PFC_all_meta.json

GW22_PFC.h5ad GW22_PFC_meta.json

and I created a new metadata file for the existing GW18 dataset already in NeMO analytics site - GW18_PFC.json - this should fix the annotation release issue.

carlocolantuoni commented 5 years ago

@RLC-DCPPC @brianherb @jorvis @casalex @apaala i have put 2 new data sets into incoming, between these and those brian has put up, i think we have all the developmental data sets we need in for the upcoming demo. /local/encrypted/NEMO/incoming/brain/development/NSCI/Cortecon/ /local/encrypted/NEMO/incoming/brain/development/Broad/scESCdifBifurcCelSeq2k/

jorvis commented 5 years ago

And what should the profile be named on the front page with all of these in them?

carlocolantuoni commented 5 years ago

Neocortical Development

On Mon, Mar 18, 2019 at 8:53 AM Joshua Orvis notifications@github.com wrote:

And what should the profile be named on the front page with all of these in them?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473897703, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hfms1SowE2jHhHnupc5oZDM9hKirbks5vX4xagaJpZM4bXH0w .

-- Carlo

jorvis commented 5 years ago

@brianherb I don't see an h5ad file for CS22_PFC in that directory.

jorvis commented 5 years ago

@carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.

seth-ament commented 5 years ago

Hi Josh,

The rows in the data matrix should be in the same order as the rows metadata file that Carlo provided in the same directory. Please go ahead and make any necessary conversions in file format and get this loaded today.

Thanks!

Seth


From: Joshua Orvis notifications@github.com Sent: Monday, March 18, 2019 9:18 AM To: nemoarchive/analytics Cc: Subscribed Subject: Re: [nemoarchive/analytics] Demo datasets to load (#23)

@carlocolantuonihttps://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG7ejsUDUGp7r0ha0yGlGE3kODvU8Domks5vX5I3gaJpZM4bXH0w.

brianherb commented 5 years ago

@jorvis - CS22_PFC.h5ad exists now -

carlocolantuoni commented 5 years ago

Those numbers are EntrezGene IDs

On Mon, Mar 18, 2019, 09:18 Joshua Orvis notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hftr2PQYw1Jr5Ud12guAMfjQlVx6Aks5vX5I3gaJpZM4bXH0w .

carlocolantuoni commented 5 years ago

Gene symbols are in the ROWmeta file

On Mon, Mar 18, 2019, 13:22 Carlo Colantuoni colantuonicarlo@gmail.com wrote:

Those numbers are EntrezGene IDs

On Mon, Mar 18, 2019, 09:18 Joshua Orvis notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hftr2PQYw1Jr5Ud12guAMfjQlVx6Aks5vX5I3gaJpZM4bXH0w .

jorvis commented 5 years ago

@brianherb thanks, all three of those are now loaded.

jorvis commented 5 years ago

Closing this as I believe all primary datasets have been loaded. Loading of pre-computed tSNE is to be tracked in #32 . For any other dataset needs please create individual tickets. These multi-load tickets are too hard to track progress in.