omnideconv / SimBu

Simulate pseudo-bulk RNAseq samples from scRNAseq expression data
http://omnideconv.org/SimBu/
GNU General Public License v3.0
12 stars 1 forks source link

General questions #45

Closed ZheFrench closed 10 months ago

ZheFrench commented 1 year ago

You set 1000 cells as default. I was wondering why , and what would be the range of cells in a normal RnaSeq ? What would be the equivalent in terms of read count also for 1000 cells ? Have you an idea ? You use one or the other in the simulation ? What would be a default read count detpth ? between 4 and 6 Millions read as real bulk or less ?

I simulate different scenarios, with number of cells equals to 1000. What should I do if I want to increase the global depth ? increasing the number of cells is ok ?

I'm struggling to understand if scenarios is in fact using specific set up for scaling factors ? For example , scenario "even" for two cell types,

scaling_factor = "custom",custom_scaling_vector = c("A"=0.5,B=0.5)

And in fact , is this possible when you have 10 cell types, to only add two scaling factors ? (or only one is possible as show in the doc "10 fold more than the rest").

In the case of 10 cell types, with scenario "mirror_db" , how I do If I want to specifically increase expression (rna_count) of 2 cell types ? You I set all cell type to 1, and the two cell types of interest to 5 for example ?

I would like to keep proportion from single cells and be able to increase one or two cell types expression values.

Something that could be cool is to merge simulations from different samples of single cells in order to plot the different proportions of cells. If you try to merge different simulations from different single cells samples, you might get an error "different row counts implied by arguments" due to different number of genes simulated. (UPDATE : Erratum, I think it's something else )

Got this for example :

Erreur dans DataFrame(..., check.names = FALSE) : 
  different row counts implied by arguments
Appels : <Anonymous> ... standardGeneric -> eval -> eval -> eval -> cbind -> DataFrame

Thanks. Again it's a great tool. :)

alex-d13 commented 1 year ago

Hi again :)

We did not have an exact reasoning why we selected 1000 cells as the default value. It is really hard to give an default value for all RNAseq experiments, it depends on the tissue, sequencing method and what type of cells you have. The same goes actually for read count depth.

Also, since SimBu uses experimental scRNA-seq data, which can have completely different sequencing depths between experiments and samples, this has to be considered by the user while simulating. SimBu is a tool to create artificial bulk samples, we do not give any quality metrics on the single-cell datasets.

So if you want to alter the sequencing depth in your simulations you have two options:

Regarding the cell-type specific scaling factors, you basically answered the question yourself, just set the scaling values to 1 for all cell types you do not want to change and to x for all other cell types.

Regarding the merging of simulations of different scRNA-seq datasets: You could either integrate single-cell datasets prior to the simulations or you have to manually merge them. SimBu does not have this option, as I think this is really experiment-specific (different number of genes, simulation parameters, ...).

I hope I could answer your questions :) Alex