create public saved analyses for datasets in sc workbench and label each cluster by its cell type assignment.

nemoarchive / analytics

Repository for the NeMO Analytics project.

MIT License

1 stars 0 forks source link

create public saved analyses for datasets in sc workbench and label each cluster by its cell type assignment. #49

Open seth-ament opened 5 years ago

seth-ament commented 5 years ago

From Ronna: "For each one of the single cell datasets that we have in the workbench - we need a saved analysis that is as close as possible to the published data, that includes the cell type assignments. This way when users go to the saved analyses - they can immediately, for example, choose to compare the microglia to the astrocytes, or the EN to the nEN etc."

seth-ament commented 5 years ago

@brianherb can you take this on? It should be easy for all the 10X datasets, since you already did those analyses with scanpy. Hopefully also for the Yao et al. in vitro sc dataset.

Getting a scanpy analysis to closely match the published cell types from the Kriegstein 3.4k dataset may be more difficult, since they did iterative clustering with rather different methods.

Carlo, were you the one that came up with the "majorType" assignments that we've been using, or is that from the metadata Aparna gave us?

seth-ament commented 5 years ago

@jorvis is it possible to upload cluster assignments generated outside the workbench as saved analyses? I'm specifically wondering with regard to the Nowakowski 2017 Science dataset (ARK3.4k sc). The Kriegstein lab did some fairly sophisticated / non-standard stuff to arrive at the published cell types in their paper. We used those published cell type assignments in other parts of the nemoanalytics portal. I'm considered that us doing our own quick analysis in the workbench would create confusion and potentially annoy our collaborators....

carlocolantuoni commented 5 years ago

Majortype came from aparna I think (I got it from u). Majortype2, 3 and 4 are different simplifications of that. majortype4 is most useful and the one we should use for cell type mapping.

On Sun, Apr 7, 2019, 23:10 Seth Ament notifications@github.com wrote:

@jorvis https://github.com/jorvis is it possible to upload cluster assignments generated outside the workbench as saved analyses? I'm specifically wondering with regard to the Nowakowski 2017 Science dataset (ARK3.4k sc). The Kriegstein lab did some fairly sophisticated / non-standard stuff to arrive at the published cell types in their paper. We used those published cell type assignments in other parts of the nemoanalytics portal. I'm considered that us doing our own quick analysis in the workbench would create confusion and potentially annoy our collaborators....

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/49#issuecomment-480668345, or mute the thread https://github.com/notifications/unsubscribe-auth/Af6hfqEMjH9EXiI7BrdrNGADiu6fmNu8ks5verM-gaJpZM4chMT4 .

jorvis commented 5 years ago

I believe it's possible, yes, but question becomes how practical is it to do before your presentation and the other wish list. We're arriving at that point where we need to pick our Most Wanted and focus on those.

brianherb commented 5 years ago

As it stands right now, there are functions that I use that are not present in the single cell workbench, so I cannot recreate the tSNE plots I created - but let's see if we can add one key function:

@jorvis - In the single cell workbench, could you add the ability to run the function sc.pp.scale() after "Identify highly-variable genes", but before "Principal Component Analysis" ? Here is a chunk of my code for this section -

sc.pp.normalize_per_cell(adata,counts_per_cell_after=1000)

filter_result = sc.pp.filter_genes_dispersion(adata.X, min_mean=0.01, max_mean=3, min_disp=0.5, log=True)

adata = adata[:,ar(filter_result.gene_subset)]

sc.pp.filter_genes(adata, min_counts=1) ## this line ensures that there are no cells with zero counts

sc.pp.scale(adata) ## this is the new function to be added

sc.tl.pca(adata, n_comps=30) ## should we add in the ability to set n_comps?

sc.pp.neighbors(adata)

sc.tl.louvain(adata,resolution=1)