guanjue / S3norm

An normalization method that can normalize high signals without inflate the background noise.
MIT License
7 stars 3 forks source link

Questions on S3norm run set up #5

Open faniafeby13 opened 2 years ago

faniafeby13 commented 2 years ago

Hi, I want to use the S3norm for my ChIP-seq dataset normalization. But before I use it, I want to ask some questions on the usage to clarify:

  1. I will run the S3norm in 4 treatments but using the same antibody for one transcription factor. The treatments are as follows: 1. A group of mice under a specific diet condition, 2. Mice with a normal diet, 3. Mice with the injection of a drug, 4. Mice with vehicle injection. Can I run them all together? I am thinking on running it with this setting with normal diet and vehicle as the controls, is that correct? Head file_list.txt Specialdiet_rep1.sorted.bedgraph Normaldiet_rep1.sorted.bedgraph Specialdiet_rep2.sorted.bedgraph Normaldiet_rep2.sorted.bedgraph Injectedmice_rep1.sorted.bedgraph Vehiclemice_rep1.sorted.bedgraph Injectedmice_rep2.sorted.bedgraph Vehiclemice_rep2.sorted.bedgraph

  2. Can I run samples that used 3 different antibodies for 3 transcription factors together? And what should I use for control, since they are all under the same treatment and genotypes and only differ in transcription factor being immunoprecipitated. So far, we assumed that those three transcription factors interact in some pathways.

  3. Since this method doesn't mention normalization for IP efficiency, is there a recommended method to use in combination with S3norm?

Thanks in advance

Fania

guanjue commented 2 years ago

Hi, Fania

For Q1, if the assumption of S3norm is expected in your data (common peaks have the similar mean signal across different treatments), you can run all of them together. For Q2, in our analysis, we mainly tried to use the no-antibody sample as the control in order to adjust the variation of background noise. I assume your analysis is trying to get the log-fold-change and the p-value after treatment. The background negative binomial model in S3norm may not be a good fit. You may need to normalize the data first and then use other methods like edger or deseq to get the logFC and the p-values normal by using the diet and vehicle as the controls.

For Q3, since you have multi-TF ChIP-seq data, it may be better to use the S3V2-IDEAS pipeline (https://github.com/guanjue/S3V2_IDEAS_ESMP) for normalization. In S3V2-IDEAS pipeline, you can use each TF as a histone modification feature in genome segmentation. The pipeline can adjust the peak mean (Not the common peak mean) across different TFs. Also, since you have multiple-related TFs, it may be interesting to see the combinatorial pattern and the transitions in different treatments of the TFs. (which can be shown in genome segmentation analysis)

For Q4, in our analysis, we consider IP efficiency can be reflected in the signal-to-noise ratio which should be adjusted by S3V2.

One thing to note here, for S3norm and S3V2, they are both developed mainly for genome segmentation analysis which put more weights on the low signal part. For treatment analysis, I found people are more focused on the peak regions with higher signals. We found that sometimes, the log transformation in S3norm may create weird signals in peak regions with high signals.

Best wishes. Guanjue

On Mon, Sep 5, 2022 at 5:51 AM faniafeby13 @.***> wrote:

Hi, I want to use the S3norm for my ChIP-seq dataset normalization. But before I use it, I want to ask some questions on the usage to clarify:

  1. I will run the S3norm in 4 treatments but using the same antibody for one transcription factor. The treatments are as follows: 1. A group of mice under a specific diet condition, 2. Mice with a normal diet, 3. Mice with the injection of a drug, 4. Mice with vehicle injection. Can I run them all together? I am thinking on running it with this setting with normal diet and vehicle as the controls, is that correct? Head file_list.txt Specialdiet_rep1.sorted.bedgraph Normaldiet_rep1.sorted.bedgraph Specialdiet_rep2.sorted.bedgraph Normaldiet_rep2.sorted.bedgraph Injectedmice_rep1.sorted.bedgraph Vehiclemice_rep1.sorted.bedgraph Injectedmice_rep2.sorted.bedgraph Vehiclemice_rep2.sorted.bedgraph

  2. Can I run samples that used 3 different antibodies for 3 transcription factors together? And what should I use for control, since they are all under the same treatment and genotypes and only differ in transcription factor being immunoprecipitated. So far, we assumed that those three transcription factors interact in some pathways.

  3. Since this method doesn't mention normalization for IP efficiency, is there a recommended method to use in combination with S3norm?

Thanks in advance

Fania

— Reply to this email directly, view it on GitHub https://github.com/guanjue/S3norm/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3XPDCY37BWNFIZG37DBELV4W7C7ANCNFSM6AAAAAAQE2LERQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

faniafeby13 commented 2 years ago

Hi Guanjue,

thanks for your comprehensive explanation! I will try to run my samples as you suggested

faniafeby13 commented 2 years ago

Hi, I have further questions:

  1. I have re-read your paper, but I still didn't get how you normalized the sequencing depth with the algorithm. Is that based on the number of mapped peaks to estimate the SD?
  2. In our case, we don't have the no-antibody control for each sample, but we only have one merged input from various samples. Is that still possible to run the S3norm?

Thanks!

guanjue commented 2 years ago

Sorry for the late reply. (due to some deadlines to finish in the last two weeks)

1, No, the SD is not adjusted based on the number of mapped reads in peak regions or the number of peaks. The S3norm is trying to simultaneously match both the average read counts of all common peak regions and the average read counts of all common background regions. The average read counts of all common background regions can be considered as the adjusted sequencing depth. So matching that between the 2 samples is an approximation of matching SD.

2, For S3norm the no-antibody control is used to adjust local background within each sample. So, in your case, use the same no antibody control for 2 samples so be ok.

On Thu, Sep 8, 2022 at 7:16 AM faniafeby13 @.***> wrote:

Hi, I have further questions:

  1. I have re-read your paper, but I still didn't get how you normalized the sequencing depth with the algorithm. Is that based on the number of mapped peaks to estimate the SD?
  2. In our case, we don't have the no-antibody control for each sample, but we only have one merged input from various samples. Is that still possible to run the S3norm?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/guanjue/S3norm/issues/5#issuecomment-1240577238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3XPDBXOU5NNC5MNVPJSVDV5HDJDANCNFSM6AAAAAAQE2LERQ . You are receiving this because you commented.Message ID: @.***>