omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Mirror_db cell types are mislabeled #39

Closed arielah closed 1 year ago

arielah commented 1 year ago

I ran the function pseudo_test <- simulate_bulk(dataset, scenario = "mirror_db", scaling_factor = "NONE", ncells = 3000) and was surprised to see that the simulated proportions were dominated by cell types that were relatively sparse in my dataset.

A sample of pseduo$cell_fractions:

                     T cells Endothelial cells Fibroblasts   B cells Plasma cells Macrophages          DC   Monocytes         ILC    NK cells Mast cells
mirror_db_sample1 0.03303303       0.014014014  0.06906907 0.4004004  0.017017017  0.02502503 0.023023023 0.000000000 0.003003003 0.001001001  0.4144144
mirror_db_sample2 0.06403013       0.012241055  0.06120527 0.3775895  0.028248588  0.01600753 0.008474576 0.002824859 0.013182674 0.016949153  0.3992467
mirror_db_sample3 0.04919679       0.002008032  0.06827309 0.3855422  0.004016064  0.02610442 0.015060241 0.007028112 0.016064257 0.002008032  0.4246988

Versus the true cell type proportions in my single-cell data:

          B cells                DC Endothelial cells       Fibroblasts               ILC       Macrophages        Mast cells         Monocytes          NK cells 
      0.048519394       0.010287780       0.069651050       0.382594189       0.009036563       0.026136522       0.006117058       0.008758515       0.005978034 
     Plasma cells           T cells 
      0.011956068       0.420964827 

It looks like the sampling is correct, but the cell types are ordered alphabetically whereas the labels are not. I think the problem is in https://github.com/omnideconv/SimBu/blob/main/R/simulator.R line 346: names(simulation_vector) <- unique(SummarizedExperiment::colData(data)[["cell_type"]])

Here the cell types are ordered based on order of appearance in the list, not alphabetically. I would recommend switching to names(simulation_vector) <- names(mirror_values).

alex-d13 commented 1 year ago

Hi @arielah,

Thank you for bringing this up! I will implement a fix for this issue asap :)

Best, Alex