We realized that bulk_construct() starts misbehaving if your sc data contains >100 clusters. bulk_construct() uses summary() to calculate the number of cells per cluster, and summary() has a default argument maxsum = 100. Because of this, the number of cells will be only outputted for the top 100 most represented clusters in each sample, and the rest will be counted as "Other". When applied to the entire data set, this somehow leads to having wrong counts for some clusters - perhaps because of the way ddply() works.
Here is a temporary workaround that seems to work for us:
Extract the code of bulk_construct(), add maxsum=<the number of clusters in your data or some arbitrary large number> inside summary(x[, clusters]) and save as a custom function.
Hi,
We realized that
bulk_construct()
starts misbehaving if your sc data contains >100 clusters.bulk_construct()
usessummary()
to calculate the number of cells per cluster, andsummary()
has a default argumentmaxsum = 100
. Because of this, the number of cells will be only outputted for the top 100 most represented clusters in each sample, and the rest will be counted as "Other". When applied to the entire data set, this somehow leads to having wrong counts for some clusters - perhaps because of the wayddply()
works.Here is a temporary workaround that seems to work for us: Extract the code of
bulk_construct()
, addmaxsum=<the number of clusters in your data or some arbitrary large number>
insidesummary(x[, clusters])
and save as a custom function.