shaistamadad / GPLVM_Shaista

0 stars 0 forks source link

generate plots of embeddings of pancreas and iPSC dataset with standard PCA+UMAP preprocessing pipeline from scanpy (which metadata covariates explain most of the variation found in these datasets? Does the embedding capture the pseudotime ordering estimated with RNA velocity analysis?) #7

Open shaistamadad opened 2 years ago

shaistamadad commented 2 years ago

which metadata covariates explain most of the variation found in these datasets? : How do I go about analysing this in the datasets?

Does the embedding capture the pseudotime ordering estimated with RNA velocity analysis? for the pancreas dataset yes, for bonemarrow no

emdann commented 2 years ago

which metadata covariates explain most of the variation found in these datasets? : How do I go about analysing this in the datasets?

When you color cells on the embedding by different metadata columns in adata.obs (e.g. the cell cycle phases) do you see certain clusters of cells containing just cells from the same phase or with the same label? There are not many additional covariates/metadata to use in these small datasets, but it's just the sanity check that you would do to detect batch effects or any other technical effects in the data. Another way to check for technical effects of unknown origin (e.g. you might have cells processed in 2 batches but no info on which is which in the adata.obs) is to pick a few marker genes of the cell types you expect to have in your dataset and check that their expression is localized in one or a few clusters in the embeddings. For example in the pancreas, if you find 2 distinct clusters that express the alpha-cells markers, that might be an indication that there are some technical factors introducing variation in gene expression profiles.