bulk_construct issues with >100 clusters

Hi,

We realized that bulk_construct() starts misbehaving if your sc data contains >100 clusters. bulk_construct() uses summary() to calculate the number of cells per cluster, and summary() has a default argument maxsum = 100. Because of this, the number of cells will be only outputted for the top 100 most represented clusters in each sample, and the rest will be counted as "Other". When applied to the entire data set, this somehow leads to having wrong counts for some clusters - perhaps because of the way ddply() works.

Here is a temporary workaround that seems to work for us: Extract the code of bulk_construct(), add maxsum=<the number of clusters in your data or some arbitrary large number> inside summary(x[, clusters]) and save as a custom function.

xuranw / MuSiC

bulk_construct issues with >100 clusters #89