I ran the function pseudo_test <- simulate_bulk(dataset, scenario = "mirror_db", scaling_factor = "NONE", ncells = 3000) and was surprised to see that the simulated proportions were dominated by cell types that were relatively sparse in my dataset.
Versus the true cell type proportions in my single-cell data:
B cells DC Endothelial cells Fibroblasts ILC Macrophages Mast cells Monocytes NK cells
0.048519394 0.010287780 0.069651050 0.382594189 0.009036563 0.026136522 0.006117058 0.008758515 0.005978034
Plasma cells T cells
0.011956068 0.420964827
It looks like the sampling is correct, but the cell types are ordered alphabetically whereas the labels are not. I think the problem is in https://github.com/omnideconv/SimBu/blob/main/R/simulator.R line 346: names(simulation_vector) <- unique(SummarizedExperiment::colData(data)[["cell_type"]])
Here the cell types are ordered based on order of appearance in the list, not alphabetically. I would recommend switching to names(simulation_vector) <- names(mirror_values).
I ran the function
pseudo_test <- simulate_bulk(dataset, scenario = "mirror_db", scaling_factor = "NONE", ncells = 3000)
and was surprised to see that the simulated proportions were dominated by cell types that were relatively sparse in my dataset.A sample of
pseduo$cell_fractions
:Versus the true cell type proportions in my single-cell data:
It looks like the sampling is correct, but the cell types are ordered alphabetically whereas the labels are not. I think the problem is in https://github.com/omnideconv/SimBu/blob/main/R/simulator.R line 346:
names(simulation_vector) <- unique(SummarizedExperiment::colData(data)[["cell_type"]])
Here the cell types are ordered based on order of appearance in the list, not alphabetically. I would recommend switching to
names(simulation_vector) <- names(mirror_values)
.