nemoarchive / analytics

Repository for the NeMO Analytics project.
MIT License
1 stars 0 forks source link

ingest mini-atlas data (MOp) #56

Closed seth-ament closed 5 years ago

seth-ament commented 5 years ago

These are the datasets. Counts and initial clustering for all except MERFISH are in Dropbox.

seth-ament commented 5 years ago

@brianherb can you begin organizing these data for upload? The datasets are in order by level of maturity and priority. So we can start at the top of the list and work down.

brianherb commented 5 years ago

Certainly. Question - Should I set the contact info as me to start, or the PI? I think I remember that the PI would have to have an account in order to make them the contact.

seth-ament commented 5 years ago

It's definitely easiest if you are the contact to start. If someone from the mini-atlas requests, we can change them to the owner of a dataset later. Thanks!

brianherb commented 5 years ago

@jorvis - I was able to create an .h5ad file from the old .h5 - could you test these files and see if they can load into Nemo Analytics:

Data file: /local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/umi_counts_10XcellsMOp.h5ad

And you can pick your favorite format for metadata file: /local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/MOp_SC_10X_Zeng_meta.json /local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/metadata_MOp_SC_10Xseq_Zeng.xlsx

Thanks! if this works, I'll try updating the rest

jorvis commented 5 years ago

I have put this dataset on NeMO (production) and accidentally gEAR (production). Private on both and owned by Brian. INSERT SQL for my own notes:

INSERT INTO dataset (id, owner_id, title, organism_id, is_public, ldesc, date_added, dtype,
                     share_id, load_status, has_h5ad, contact_name, contact_institute, contact_email,
                     annotation_release)
VALUES ('a550bc33-3d13-4ee7-8039-e9134da2664b', 479, 'Primary Motor Cortex from Mouse  - Single Cell 10X RNAseq ', 1,
         0, 'Cells from Mouse Primary Motor Cortex were isolated from multiple mice and RNA from individual cells were sequenced using 10X RNAseq.',
         NOW(), 'single-cell-rnaseq', '8d73898b-c680-4f74-a430-380203e3ffec', 'completed', 1, 'Brian Herb',
         'University of Maryland School of Medicine', 'bherb@som.umaryland.edu', 84);
brianherb commented 5 years ago

Zeng 10X single nuclei RNAseq data is now processed. Files are here:

Data file: /local/projects-t3/idea/bherb/NeMO_work/10X_nuclei_MOp/umi_counts_10XnucleiMOp_tsne.h5ad

And you can pick your favorite format for metadata file: /local/projects-t3/idea/bherb/NeMO_work/10X_nuclei_MOp/MOp_SN_10X_Zeng_meta.json /local/projects-t3/idea/bherb/NeMO_work/10X_nuclei_MOp/metadata_MOp_SN_10Xseq_Zeng.xlsx

Note that the .h5ad file now has metadata and tsne coordinates in the .obs slot of the adata object

Also, I processed a new .h5ad file for the Zeng 10X single cell data that now includes metadata and tsne coordinates in the .obs slot of the adata object. That file is here:

/local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/umi_counts_10XcellsMOp_tsne.h5ad

and the old sample metadata files can still be used: /local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/MOp_SC_10X_Zeng_meta.json /local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/metadata_MOp_SC_10Xseq_Zeng.xlsx

brianherb commented 5 years ago

In the dropbox folder, the tsne.df.csv (tsne coordinates) file for single-cell SMART-seq (Allen) is empty - can we request that the Allen re-upload this file?

brianherb commented 5 years ago

Zeng SMARTer single nuclei RNAseq data is now processed. Files are here:

Data file: /local/projects-t3/idea/bherb/NeMO_work/SMARTer_nuclei_MOp/umi_counts_SN_SMARTer_MOp_tsne.h5ad

metadata file: /local/projects-t3/idea/bherb/NeMO_work/SMARTer_nuclei_MOp/MOp_SN_SMARTer_Zeng_meta.json

brianherb commented 5 years ago

Zeng SMARTer single cell RNAseq data is now processed. Files are here:

Data file: /local/projects-t3/idea/bherb/NeMO_work/SMARTer_cells_MOp/umi_counts_SC_SMARTer_MOp_tsne.h5ad

metadata file: /local/projects-t3/idea/bherb/NeMO_work/SMARTer_cells_MOp/MOp_SC_SMARTer_Zeng_meta.json

brianherb commented 5 years ago

Lein datasets are prepared:

10X nuclei: /local/projects-t3/idea/bherb/NeMO_work/Lein_10X_nuclei_MOp/umi_counts_Lein_10X_nuclei_MOp_tsne.h5ad

/local/projects-t3/idea/bherb/NeMO_work/Lein_10X_nuclei_MOp/Lein_10X_nuclei_MOp_meta.json

SMARTer nuclei: /local/projects-t3/idea/bherb/NeMO_work/Lein_SMARTer_nuclei_MOp/umi_counts_Lein_SMARTer_nuclei_MOp_tsne.h5ad

/local/projects-t3/idea/bherb/NeMO_work/Lein_SMARTer_nuclei_MOp/MOp_SC_SMARTer_Zeng_meta.json

brianherb commented 5 years ago

Also, I created a reduced dataset for the 10X samples if we want to use them: Downsampled 10X data (down to 10K random cells):

/local/projects-t3/idea/bherb/NeMO_work/Lein_10X_nuclei_MOp/umi_counts_Lein_10X_nuclei_MOp_tsne_Downsample10K.h5ad

/local/projects-t3/idea/bherb/NeMO_work/10X_cells_MOp/umi_counts_10XcellsMOp_tsne_Downsample10K.h5ad

/local/projects-t3/idea/bherb/NeMO_work/10X_nuclei_MOp/umi_counts_10XnucleiMOp_tsne_Downsample10K.h5ad

For all of these files, the accompanying .json metadata file can be used.

jorvis commented 5 years ago

All of these are loaded currently except for the reduced/downsized ones posted just before this comment. I'd at least need different titles for those (a few h5ad files are still transfering, but within the hour they should all be complete). Again they are loaded in private mode and owned by the NeMO Curator.

brianherb commented 5 years ago

Newly prepped datasets:

Lein SMARTer nuclei All data: /local/projects-t3/idea/bherb/NeMO_work/Lein_SMARTer_nuclei_MOp/umi_counts_Lein_SMARTer_nuclei_MOp_tsne.h5ad Downsampled: /local/projects-t3/idea/bherb/NeMO_work/Lein_SMARTer_nuclei_MOp/umi_counts_Lein_SMARTer_nuclei_MOp_tsne_Downsample10K.h5ad Annotation: /local/projects-t3/idea/bherb/NeMO_work/Lein_SMARTer_nuclei_MOp/Lein_SMARTer_nuclei_MOp_meta.json

Kriegstein MOp: All data: /local/projects-t3/idea/bherb/NeMO_work/Kriegstein_10X_cells_MOp/umi_counts_Kriegstein_10X_cells_MOp_raw.h5ad Annotation: /local/projects-t3/idea/bherb/NeMO_work/Kriegstein_10X_cells_MOp/Kriegstein_10X_cells_MOp_meta.json

Ecker / Callaway ATACseq All data: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snATAC-Seq/EckerCallaway_snATAC_Seq_MOp_tSNE.h5ad Annotation: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snATAC-Seq/EckerCallaway_snATAC_Seq_MOp_meta.json

brianherb commented 5 years ago

The last dataset that can be processed is done. I'm going to close this ticket - if in the future the Marmoset and MERFISH data becomes available, we can create a new ticket for them.

This methylation data is summarized by gene or by 100K base pair bins. For now I processed the gene-centric dataset, I figure it would work best with NeMO analytics - but note that tsne clustering is based on 100K bins. Also, the data is broken down by CG or CH methylation (i.e. CG = methylation on Cytosine nucleotide which is followed by a Guanine, whereas CH = methylation on Cytosine nucleotide which is NOT followed by a Guanine (so A, C or T). I prepared CG and CH datasets separately.

Data files: Ecker / Callaway methylseq - CG methylation by Gene All data: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snmC-Seq/EckerCallaway_snmCG_Seq_MOp_tSNE.h5ad Annotation: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snmC-Seq/EckerCallaway_snmCG_Seq_MOp_meta.json

Ecker / Callaway methylseq - CH methylation by Gene All data: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snmC-Seq/EckerCallaway_snmCH_Seq_MOp_tSNE.h5ad Annotation: /local/projects-t3/idea/bherb/NeMO_work/EckerCallaway_snmC-Seq/EckerCallaway_snmCH_Seq_MOp_meta.json