Closed LoaiNaom closed 1 month ago
Rare gene modules capture rare and weak gene programs that would otherwise be missed by the algorithm. What you see is our rare behavior detection pre-processing phase detected that 14 such programs created a metacell for each one. Naturally, being rare, most of the data is not captured by these, so from that point of view, most of the data is an outlier compared to them.
@orenbenkiki Thank you. Really helpful! Lastly: any tips for better cells separation for a very big dataset ? Dataset included multiple cell types, immune and non-immune, trying to separate them. So far unsuccessfully. Perhaps you could recommend changing some parameters or something like that, for a better separation.
Metacells lives and dies by the lateral genes list. The main tool we use for this is looking at the markers heatmap in MCView, and search for genes that "shouldn't be there" - that is, strong marker genes (which therefore were used by the algorithm to collect the metacells) which reflect biology irrelevant to the question at hand (e.g., cell cycle, stress, hypoxia, etc.). Adding these to the lateral genes list and recomputing the metacells should help. You may have to repeat this process a few times.
Another separate issue is strong batch effects - due to using different technologies or protocols or similar reasons. You can detect this by viewing the % of the cells from each batch in the metacells - ideally it should be pretty uniform within each "cell type". If you see metacells that come from only one batch, even though they "should" be the same as others, then use differential expression to try and figure out which genes are to blame, mark them as lateral if possible, or use some pre-processing of the batches to fix the issue. These issues sometimes get messy having to decide whether the batch differences are real or technical...
Either way, getting good metacells is an iterative process - we always have to go through this cycle a few times (and also remove doublet cells and/or other "junk" cells in the process) to get a high-quality result.
@orenbenkiki Yes I tried using MCView, but I faced some issues with it in python. The usage of this tool is not very clear. I'll use the heatmaps, as you did in the tutorial. Thanks a lot for your help!
When running
divide_and_conquer_pipeline
using the default parameters values, I get this information and I would very much appreciate some guidance in the matter :could you please clarify what are the rare genes and rare cells ? As you can see I have very few of them, and most of the data are being marked as outliers for some reason. What is the meaning of these outliers and how does all this affect the metacells calculations ?