xuranw / MuSiC

Multi-subject Single Cell Deconvolution
https://github.com/xuranw/MuSiC
GNU General Public License v3.0
234 stars 94 forks source link

bulk_construct issues with >100 clusters #89

Open bsierieb1 opened 3 years ago

bsierieb1 commented 3 years ago

Hi,

We realized that bulk_construct() starts misbehaving if your sc data contains >100 clusters. bulk_construct() uses summary() to calculate the number of cells per cluster, and summary() has a default argument maxsum = 100. Because of this, the number of cells will be only outputted for the top 100 most represented clusters in each sample, and the rest will be counted as "Other". When applied to the entire data set, this somehow leads to having wrong counts for some clusters - perhaps because of the way ddply() works.

Here is a temporary workaround that seems to work for us: Extract the code of bulk_construct(), add maxsum=<the number of clusters in your data or some arbitrary large number> inside summary(x[, clusters]) and save as a custom function.