rhondabacher / SCnorm

Normalization for single cell RNA-seq data
46 stars 9 forks source link

Using SCnorm on bulk RNA-seq data #33

Open priyanka8590 opened 3 years ago

priyanka8590 commented 3 years ago

Hello,

I am trying to use SCnorm to normalize bulk RNA-seq data. I have about 10 studies with differing numbers of samples. I removed any genes that have no counts in any samples. I then used the advice given in the vignette regarding using SCnorm for UMI data.

Conditions <- c(rep(1, 35), rep(2, 6), rep(3, 12), rep(4, 8), rep(5, 3), rep(6, 5), rep(7, 38), rep(8, 3), rep(9, 3), rep(10, 45))
NumReads_SCNorm <- SCnorm(Data = NumReads_matSCN_final, Conditions = Conditions, PrintProgressPlots = TRUE, NCores = 3, PropToUse = 0.1, FilterCellNum = 10, ditherCounts = TRUE, Thresh = 0.1)

It gave me the following error:

Error in SCnorm(Data = NumReads_matSCN_final, Conditions = Conditions,  : 
  At least one condition has less then 100 genes that pass the specified filter. Check the quality of your data or filtering criteria. 
       SCnorm may not be appropriate for your data (see vignette for details).

I further retained genes that had total expression to be more than 50 and ran the above code and it still gave me the same error. I then used

countDeptEst <- plotCountDepth(Data = NumReads_matSCN_final, Conditions = Conditions, FilterCellProportion = 0.01, NCores = 3)

to get the count-depth relationship. It gave me the following error:

Error in plotCountDepth(Data = NumReads_matSCN_final, Conditions = Conditions,  : 
  Less than 100 genes pass the filter specified! 
               Try lowering thresholds or perform more QC on your data.

I don't know what other Quality control I need to do in order to normalize my data using SCnorm. Thank you for your help in advance!

Priyanka

rhondabacher commented 3 years ago

Hi Priyanka,

Thanks for your message and for using SCnorm. The error message is occurring because some of your studies have fewer than 10 samples (e.g. Study 5 has 3 samples). SCnorm is not intended to be used on very few samples. Since this is bulk data you can normalize by treating all samples as the same "Condition".

Best, Rhonda

priyanka8590 commented 3 years ago

Thank you so much, Rhonda. That helped!

priyanka8590 commented 3 years ago

Hi Rhonda,

I implemented your suggestion into my code,

Conditions <- c(rep(1, 158))
countDeptEst <- plotCountDepth(Data = NumReads_matSCN_final, Conditions = Conditions, FilterCellProportion = 0.1, NCores = 3)
NumReads_SCNorm <- SCnorm(Data = NumReads_matSCN_final, Conditions = Conditions, PrintProgressPlots = TRUE, NCores = 3, PropToUse = 0.1, FilterCellNum = 10, ditherCounts = TRUE, Thresh = 0.1)

I have 158 samples in the bulk RNA-seq data. I also incorporated your suggestions for using SCnorm with bulk data given in the vignette. However, it threw this error:

Setting up parallel computation using 3 cores
Jittering values introduces some randomness, 
        for reproducibility set.seed(1) has been set.
Gene filter is applied within each condition.
1054 genes in condition 1 will not be included in the normalization due to 
             the specified filter criteria.
A list of these genes can be accessed in output, 
    see vignette for example.
Finding K for Condition 1
Trying K = 1
Trying K = 2
Trying K = 3
Trying K = 4
Trying K = 5
Trying K = 6
Trying K = 7
Trying K = 8
Trying K = 9
Trying K = 10
Trying K = 11
Trying K = 12
Trying K = 13
Trying K = 14
Trying K = 15
Trying K = 16
Trying K = 17
Trying K = 18
Trying K = 19
Trying K = 20
Trying K = 21
Trying K = 22
Trying K = 23
Trying K = 24
Trying K = 25
Trying K = 26
Error in normWrapper(Data = DataList[[x]], SeqDepth = SeqDepthList[[x]],  : 
  SCnorm is unable to converge. 
                         Consider altering the filter criteria such as FilterExpression. 
                         See vignette for additional details.
Calls: SCnorm -> lapply -> FUN -> normWrapper
Execution halted

Is there a cutoff for what the expression levels should be? maybe I missed coming across it in the vignette? Thank you so much for all your help!

Thank you, Priyanka

rhondabacher commented 3 years ago

Dear Priyanka,

In this case, it is best to use a normalization designed for bulk RNA-seq data. You can use MedianNormalization as described in the Anders and Huber 2010 paper on DESeq, which I implemented into a shiny app here: https://github.com/rhondabacher/Median-Ratio-Normalization

Best, Rhonda

On Wed, Apr 14, 2021 at 5:00 PM Priyanka Bhandary @.***> wrote:

Hi Rhonda,

I implemented your suggestion into my code,

Conditions <- c(rep(1, 158)) countDeptEst <- plotCountDepth(Data = NumReads_matSCN_final, Conditions = Conditions, FilterCellProportion = 0.1, NCores = 3) NumReads_SCNorm <- SCnorm(Data = NumReads_matSCN_final, Conditions = Conditions, PrintProgressPlots = TRUE, NCores = 3, PropToUse = 0.1, FilterCellNum = 10, ditherCounts = TRUE, Thresh = 0.1)

I have 158 samples in the bulk RNA-seq data. I also incorporated your suggestions for using SCnorm with bulk data given in the vignette. However, it threw this error:

Setting up parallel computation using 3 cores Jittering values introduces some randomness, for reproducibility set.seed(1) has been set. Gene filter is applied within each condition. 1054 genes in condition 1 will not be included in the normalization due to the specified filter criteria. A list of these genes can be accessed in output, see vignette for example. Finding K for Condition 1 Trying K = 1 Trying K = 2 Trying K = 3 Trying K = 4 Trying K = 5 Trying K = 6 Trying K = 7 Trying K = 8 Trying K = 9 Trying K = 10 Trying K = 11 Trying K = 12 Trying K = 13 Trying K = 14 Trying K = 15 Trying K = 16 Trying K = 17 Trying K = 18 Trying K = 19 Trying K = 20 Trying K = 21 Trying K = 22 Trying K = 23 Trying K = 24 Trying K = 25 Trying K = 26 Error in normWrapper(Data = DataList[[x]], SeqDepth = SeqDepthList[[x]], : SCnorm is unable to converge. Consider altering the filter criteria such as FilterExpression. See vignette for additional details. Calls: SCnorm -> lapply -> FUN -> normWrapper Execution halted

Is there a cutoff for what the expression levels should be? maybe I missed coming across it in the vignette? Thank you so much for all your help!

Thank you, Priyanka

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rhondabacher/SCnorm/issues/33#issuecomment-819828672, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSOD3G4DYG3QON2A5EBRBLTIX67FANCNFSM422ICTYQ .

-- Rhonda Bacher, PhD Assistant Professor Department of Biostatistics University of Florida Office: 5239 CTRB Phone: 352-294-5914 Email: @.***

http://rhondabacher.com/