kdzimm / hierarchicell

An R package for simulating cell-type specific and hierarchical single-cell expression data
9 stars 6 forks source link

Questions about the simulator #1

Closed SirKuikka closed 3 years ago

SirKuikka commented 3 years ago

Hi,

I have several questions related to your simulator.

  1. Should the input be raw count values or normalized count values?
  2. After obtaining the simulated count data from _simulatehierarchicell, is the output data normalized or raw count values?
  3. How should one normalize the output data if they are not normalized?
  4. Can the simulator be used with droplet based data such as Chromium?
  5. Perhaps the most important question: how do I know which genes are differential expressed in the simulated data? I understand that the ultimate goal of the simulator is to estimate the power. But I was just wondering whether there exists a way to obtain the information on which genes are differentially expressed in the simulated data? Or are they all differentially expressed between the two conditions?

Thank you for taking the time to answer my questions.

kdzimm commented 3 years ago

@SirKuikka

1) Raw or TPM counts should be input 2) The output data are raw count values (or "TPM" - modified to integers - if TPM were input) 3) To normalize output I would suggest DESeq2's normalization or CLR. There are other single-cell specific normalizations you may prefer. 4) It can be used with droplet based data, but I have had occasional problems with the program when working for other kinds of data in the past. I think it is related to the distributions of missing values, but that is not the only thing that causes problems. This is something I am still hoping to develop and make the simulation more broadly applicable. 5) All of the genes you simulate will have the FC you specify in your simulation (see the "foldchange" option in the function). If you want to build a dataframe of multiple levels of FC, you can simulate (say 1000) genes under the null first then iteratively increase the FC from 1 to 1.05, 1.1, 1.15, 1.2, ... etc. and continue appending those genes to your already simulated set of genes. I will note that when you specify a FC of 1, it does not mean every gene will have a FC of exactly 1, but the central tendency of those genes will be a FC of 1.

Hope this helps!