Closed strasserle closed 1 year ago
The text was meant as a warning, cell types are not excluded, was clarified with an updated print in commit 6397f03726ef1668351ae68d5281ca4e0d90c477 (lines 463-466):
The genes selected for those cell types potentially don't generalize well.
Find the genes for each of those cell types in self.genes_of_primary_trees after running self.select_probeset().
Describe the bug
On the small example adata, several cell types have <min_test_n celltypes, but in the masked dotplot, combinatorial markers are shown.
To Reproduce
Steps to reproduce the behavior:
The following celltypes' test set sizes for forest training are below min_test_n (=20): celltype_4 : 5 celltype_5 : 9 celltype_8 : 1
UserWarning: Zero cells of celltype celltype_8 in train or test set. No tree is calculated for celltype celltype_8. Zero cells of celltype celltype_8 in train or test set. Celltype celltype_8 is not included as reference celltype.
package versions scanpy==1.8.1 anndata==0.7.8 umap==0.5.2 numpy==1.21.4 scipy==1.7.2 pandas==1.3.4 scikit-learn==1.0.1 statsmodels==0.13.1 python-igraph==0.9.8 pynndescent==0.5.5 spapros==0.1.0
Expected behavior
Cell types with too few celltypes (4, 5, and 8) are expected to be excluded from random forest training and thus no combinatorial markers can be found.
System:
Additional context
Note that step 3 points out celltypes 4, 5, and 8 while in step 4, only step 8 is mentioned in the warning. The bug can only be seen for celltypes 4 and 5!
See spapros_tutorial_basic_selection: https://github.com/theislab/spapros/blob/529930d4df6e8c5fc5f680f493812d3cbaa9bf88/tutorials/spapros_tutorial_basic_selection.ipynb