Closed Peter-bbh closed 2 years ago
min_features reflects the minimum number of replicates (or cells) expressing the gene. In this case there are only 6 replicates in the toy dataset so you are just removing all genes.
Thanks for the answer, but I'm still confused.
I thought min_feature represented the minimum total transcript counts per gene. The help section reads: "min_features the minimum number of counts for a gene to retain it. Defaults to 0". Is it possible to filter by the minimum total transcript counts per gene?
But even if min_feature is the minimum number of cells expression the gene, then most genes are expressed in 6 replica, see example below. Why do min_feature=5 work, but not min_feature=6?
"> to_pseudobulk(hagai_toy)[[1]][1:6,] [1] "bone marrow derived mononuclear phagocytes" mouse1:lps4 mouse2:lps4 mouse3:lps4 mouse1:unst mouse2:unst mouse3:unst 0610009B22Rik 14 26 27 36 41 42 0610009L18Rik 4 7 4 4 6 7 0610010F05Rik 9 25 19 17 20 17 0610010K14Rik 4 17 15 5 10 17 0610012G03Rik 62 132 105 126 237 232 0610030E20Rik 5 22 29 9 22 29
Just to follow-up.
I managed to followed the code and now think I understand what min_feature does: Count total number of cells with at least one count for the gene and then retain only those genes with more than min_feature.
However, I believe then the help section is missleading. At least for me, a minimum limit is inclusive, that is when setting min_feature=6 it would include those with 6 and not just 7 and above? In fact this interpretation is also what has been used for min_reps using ">=" versus ">" for min_feature in to_pseudo_bulk
Also, "counts" is ambigous in RNA seq, so this should be specified.
May I suggest: 1) to change the help text to ""min_features the minimum number of expressing cells for a gene to retain it. Defaults to 0" 2) change the code to include min_feature, that is change "keep_genes = rowSums(mat_mm > 0) > min_features" in to_pseudo_bulk to "keep_genes = rowSums(mat_mm > 0) >= min_features"?
Anyway, thanks for a very useful packages (and article) :)
Thanks. I've pushed an update.
Great, thanks again :)
Sorry, I'm still confusing about that.
I try to set a DE workflow with Libra::run_de and which could filiter some genes that barely expressed(like the genes I marked):
Those genes comes from Libra::run_de and flitered by logFC and pval, but only have like 120 out of 30,000 cell expressd. So I thought those genes might not be meaningful biologically and could affect the further analysis.
And I think the min_features parameter like the min.pct in Seurat::FindMarkers, but based on you guys' disscussion, which seems should be set to less than the total replicates when using pseudobulk methods.
Does that means like, when I use single-cell methods like MAST, min_features could be set to a big number not related to the replicates, like 1,000? Also, could there be a fuction which could help fliter gene by expression per cells under pseudobulk methods?
Thanks!
We will look to add something like this in an updated version. Although you can just subset the matrix yourself prior to run_de.
Hi,
As I understand it, the min_feature option can be used to avoid testing genes with none or few counts, thereby potentially increase the power of FDR.
However, there seems to be a problem with handling missing Cell types due to filtering.
Example: