Closed LoaiNaom closed 1 month ago
1 - Eventually we'll publish lists in https://github.com/tanaylab/Gmara - at least, that's the plan. Right now, I'm afraid we don't have pre-packaged lateral gene lists.
2 - This selects the subset of genes which actually contain a meaningful signal to use for computing the metacells. This, for one thing, does not select any lateral genes. It also ensures the selected genes aren't "too uniform". Note that this is done separately on each pile (when doing divide-and-conquer), so the list of selected genes is different in different piles (e.g., genes that distinguish T cells from all other cell types might not be selected in a pile consisting only of T cells; instead, the selected genes would be these that distinguish between T cells).
3 - Marker genes are these that are "significantly different" between the metacells. These are the genes we focus on when analyzing the biological behavior of the different cell types, so of course we have many visualization tools for showing them.
4 - The answer is alas similar to 1. That said, most people would want to exclude mitochondrial genes (as suggested in the vignette). House keeping genes are trickier - typically we just mark them as lateral, as they are sometimes correlated with other gene programs.
@orenbenkiki Thank you!
And how come I explicitly insert excluded_gene_patterns = ['^HSP', '^MT']
for the exclude_genes
function, but it ignored this completely and did not filter out those genes, but filtered other genes. Perhaps the excluded_gene_patterns
and excluded_gene_names
are just suggestions for the function and does not force it to remove them ? Cause it did filter out other genes that I didn't tell it to remove.
Try ^HSP.*
and ^MT.*
to match the whole gene name? Also, I think that for mitochondrial genes the proper pattern would be ^MT-.*
because there are some MTsomething
genes (no -
) which aren't mitochndrial.
Very helpful ! So a metacells pipeline would look something like this ? I'm asking cause not all function are 100% clear so just making sure. Some of these were taken from this vignette
exclude_genes --> exclude_cells --> extract_clean_data --> mark_lateral_genes --> extract_selected_data --> divide_and_conquer_pipeline --> collect_metacells
Seems right ?
extract_selected_data is done for you internally by the divide_and_conquer_pipeline. Otherwise, yes.
@orenbenkiki When running divide_and_conquer_pipeline
using the default parameters values, I get this information and I would very much appreciate some guidance in the matter :
set adata.var[rare_gene]: 180 true (0.8069%) out of 22308 bools
set adata.var[rare_gene_module]: 22128 outliers (99.19%) and 180 grouped (0.8069%) out of 22308 int32 elements with 14 groups with mean size 12.86
set adata.obs[cells_rare_gene_module]: 2692436 outliers (99.63%) and 9996 grouped (0.3699%) out of 2702432 int32 elements with 14 groups with mean size 714
set adata.obs[rare_cell]: 9996 true (0.3699%) out of 2702432 bools
could you please clarify what are the rare genes
and rare cells
? As you can see I have very few of them, and most of the data are being marked as outliers for some reason. What is the meaning of these outliers and how does all this affect the metacells calculations ?
Thanks for this awesome tool! I have several questions which I would very appreciate if you answer.
1)This question is regarding lateral genes. If I understood correctly those are the genes that we don't want to be in the metacells calculations, but we still want to keep them in the adata object in case for other uses later on. However I was wondering if there is a way to decide which genes should be marked as lateral or if you have examples of popular genes that are usually marked as lateral ?
2)I didn't understand the purpose of
metacells.pipeline.select.extract_selected_data
, as the data has already been filtered according to multiple thresholds so why would I need this ?3)Regarding
find_metacells_marker_genes
why would I need this other than for plotting KNN or UMAP withcompute_knn_by_markers
andcompute_umap_by_markers
? Is this strictly for visualization purposes ?4)which genes would you recommend removing with
metacells.pipeline.exclude.exclude_genes
? For example : gender genes / house keeping / mitochondrial / others ?