Closed RLC-DCPPC closed 5 years ago
bed and wig file to apaala (alex) - epigenome track for epiviz
Test scRNA dataset from Kriegstein 10X samples:
Location of data on grid: /local/projects-t3/NEMO/incoming/brain/biccc/kriegstein500k/transcriptome/scell/processed/counts/for_gEAR/GW18_PFC.h5ad
Metadata: GW18_PFC_metadata.xlsx
bed and wig file to apaala (alex) - epigenome track for epiviz
@apaala, remind me if these can be regular .bed files or if they should be bigBed?
@casalex I think they need to be bigbeds, right @jkanche ?
bigbeds can be found here: /local/projects-t3/idea/acasella/NeMO/data/roadmap_epi/narrowPeaks_bigbed
@casalex do you know what the modality of the data is? Is it methylation?
These particular ones are ChIP-seq I believe. If you need more detailed info info it can be found in the metadata.tsv file in the same directory--I would look at the "Assay" and the "Experiment target" fields in particular. Let me know if that works!
@jkanche Here is the path to big beds http://data.nemoarchive.org/other/grant/epigenome_roadmap/epigenome_roadmap/chipseq/bulk/processed/align/
@apaala I subsetted the metadata--updated file is /local/projects-t3/idea/acasella/NeMO/data/roadmap_epi/narrowPeaks_bigbed/metadata.tsv
Hi Josh-
I have prepped 3 new Kriegstein 10X datasets for ingest (Anup mentioned there was a push for next Monday 3/18). I created a .h5ad file and .json metadata file for each
Directory: /local/projects-t3/NEMO/incoming/brain/biccc/kriegstein500k/transcriptome/scell/processed/counts/for_gEAR
Files: CS22_PFC.h5ad CS22_PFC_meta.json
GW19_PFC_all.h5ad GW19_PFC_all_meta.json
GW22_PFC.h5ad GW22_PFC_meta.json
and I created a new metadata file for the existing GW18 dataset already in NeMO analytics site - GW18_PFC.json - this should fix the annotation release issue.
@RLC-DCPPC @brianherb @jorvis @casalex @apaala i have put 2 new data sets into incoming, between these and those brian has put up, i think we have all the developmental data sets we need in for the upcoming demo. /local/encrypted/NEMO/incoming/brain/development/NSCI/Cortecon/ /local/encrypted/NEMO/incoming/brain/development/Broad/scESCdifBifurcCelSeq2k/
And what should the profile be named on the front page with all of these in them?
Neocortical Development
On Mon, Mar 18, 2019 at 8:53 AM Joshua Orvis notifications@github.com wrote:
And what should the profile be named on the front page with all of these in them?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473897703, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hfms1SowE2jHhHnupc5oZDM9hKirbks5vX4xagaJpZM4bXH0w .
-- Carlo
@brianherb I don't see an h5ad file for CS22_PFC in that directory.
@carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.
Hi Josh,
The rows in the data matrix should be in the same order as the rows metadata file that Carlo provided in the same directory. Please go ahead and make any necessary conversions in file format and get this loaded today.
Thanks!
Seth
From: Joshua Orvis notifications@github.com Sent: Monday, March 18, 2019 9:18 AM To: nemoarchive/analytics Cc: Subscribed Subject: Re: [nemoarchive/analytics] Demo datasets to load (#23)
@carlocolantuonihttps://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG7ejsUDUGp7r0ha0yGlGE3kODvU8Domks5vX5I3gaJpZM4bXH0w.
@jorvis - CS22_PFC.h5ad exists now -
Those numbers are EntrezGene IDs
On Mon, Mar 18, 2019, 09:18 Joshua Orvis notifications@github.com wrote:
@carlocolantuoni https://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hftr2PQYw1Jr5Ud12guAMfjQlVx6Aks5vX5I3gaJpZM4bXH0w .
Gene symbols are in the ROWmeta file
On Mon, Mar 18, 2019, 13:22 Carlo Colantuoni colantuonicarlo@gmail.com wrote:
Those numbers are EntrezGene IDs
On Mon, Mar 18, 2019, 09:18 Joshua Orvis notifications@github.com wrote:
@carlocolantuoni https://github.com/carlocolantuoni checking into the first of these (Cortecon) and the data matrix isn't indexed on gene symbol OR ensembl ID. They're just numeric values.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/23#issuecomment-473906006, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hftr2PQYw1Jr5Ud12guAMfjQlVx6Aks5vX5I3gaJpZM4bXH0w .
@brianherb thanks, all three of those are now loaded.
Closing this as I believe all primary datasets have been loaded. Loading of pre-computed tSNE is to be tracked in #32 . For any other dataset needs please create individual tickets. These multi-load tickets are too hard to track progress in.
Kriegstein 10x