rhondabacher / SCnorm

Normalization for single cell RNA-seq data
47 stars 9 forks source link

questions on setting the conditions #21

Closed sqjin closed 6 years ago

sqjin commented 6 years ago

Hi,

I am trying to perform normalization of single cell RNA-seq data from Fluidigm C1. There are three time points. For each time point, there are three chips (replicates). The gene expression is estimated using RSEM in terms of count, TPM and FPKM. Could you give me some suggestions on how to set the Conditions in this case when running DataNorm <- SCnorm(Data = ExampleSimSCData, Conditions = Conditions)?

In the SCnorm_vignette, I noticed that "In this step the assumption is that most genes are not differentially expressed (DE) between conditions and that any systematic differences in expression across the majority of genes is due to technical biases and should be removed." However, in our case, we expected to know how the genes change with the time. So there should be some genes showing differentia expression between conditions. Therefore, I am confused about whether I should set multiple conditions in our case.

In addition, which data do you suggest for the input? count, TPM and FPKM ?

Thank you!

rhondabacher commented 6 years ago

Hi,

Thanks for using SCnorm. In this situation, you can set each combination of chip replicate and time-point as its own Condition. For example, if you have two time-points with three replicates then you could label them as six different conditions. For downstream analysis, depending on which DE analysis tool you plan to use should inform whether or not you set the parameter "useZerosToScale=TRUE". Please see more information on that option in the vignette FAQ section here: https://www.biostat.wisc.edu/%7Ekendzior/SCNORM/SCnorm_vignette.pdf (You could technically set all cells to the same condition but I would be very careful in checking that time-point is not confounded with sequencing depth (total mapped reads), if it is then I would avoid doing this or at least make sure K is not too large (< ~7-8)).

For input, you should use the RSEM counts.

Please don't hesitate to let contact me if you have any further questions.

Best, Rhonda