omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

Is SummarizedExperiment::assays(simulation$bulk)[["bulk_counts"]]) a raw simulated count matrix ? #44

Closed ZheFrench closed 1 year ago

ZheFrench commented 1 year ago

Does the output SummarizedExperiment::assays(simulation$bulk)[["bulk_counts"]])` should be a simulated raw counts matrix ? It should be integer values, no ?

simulation <- SimBu::simulate_bulk(
  data = ds,
  scenario = "random",
  scaling_factor = "NONE",
  ncells = 1000,
  nsamples = 10,
  BPPARAM = BiocParallel::MulticoreParam(workers = 4), # this will use 4 threads to run the simulation
  run_parallel = TRUE
)

Also values seems to be extremely low, near zero every time. I was expecting something more "bulky", round high values. Could you comment ? Tks !

bulk.matrices <- SummarizedExperiment::assays(simulation$bulk)[["bulk_counts"]]
head(bulk.matrices)

remove_bias_in_counts=TRUE

A1BG     0.129751241 0.080558721 0.126529641 0.117063909 0.123953282
A1BG-AS1 0.003668578 0.007399223 0.005733908 0.004077106 0.002481825
A2M      0.311774289 0.883527573 0.621420094 0.555581746 0.385397228
A2M-AS1  0.003470766 0.001906584 0.001297469 0.001736828 0.002527282
A4GALT   0.007955643 0.020479028 0.013016810 0.017944614 0.010095628
AAAS     0.031415796 0.021634002 0.038293855 0.022995409 0.029937658

Update

: remove_bias_in_counts=FALSE

A1BG      617  652  584  646  566  691  537  592  580  464
A1BG-AS1   41   50   31   27   34   26   30   34   34   42
A2M      5680 4825 6197 3783 4036 2504 4935 4767 4580 4977
A2M-AS1     8   10    4   17   11   10    6    8    2    5
A4GALT    134  135  197  106  175  110  142  151  150  142
AAAS      194  181  204  185  168  225  198  209  181  144
alex-d13 commented 1 year ago

Hi,

You are right, the counts are supposed to be integers. The fact that they are not, comes from a default parameter setting in SimBu: remove_bias_in_counts=TRUE. This setting is used to remove an initial mRNA bias in the provided count matrices. If you do not want this and rather want to work with integer counts directly, just set this setting to FALSE.

One could also think about setting this parameter to FALSE by default. I will think about it when working on the last few bug fixes.

I hope this answered your question.

Best, Alex

ZheFrench commented 1 year ago

Yeah that's better :) I will give a try. It's kind of normalization by the library size by cell type . But you also have a norm_counts parameter that can be set, does it apply the same behavior ? That would be twice a normalization if both are set to TRUE, no ? I'm just curious. Tks. Really nice idea this software by the way.

FFinotello commented 1 year ago

Good point! @alex-d13 we could even add a round parameter, which, if TRUE, results in the application the round() function on the counts. Of course, we should test that the final sequencing depths are not too different from the expected ones, and the NB properties we tested in the paper are still maintained.