Closed Myrtle-bio closed 3 months ago
If I want to study which TF has the max ft_Score in the regulatory region of a specific gene, which means the most important, and there are multiple sites for the same TF within that regulatory region, which approach is more suitable - taking the maximum value or the average value?
As mentioned in your article, not all TFs will necessarily form a TF footprint. So, if in condition1, TF1 has a mean footprint score of 1.3, and TF2 has a mean footprint score of 1.5, Does it necessarily imply that TF2 is more robust or important or not?
Hi @Myrtle-bio,
Thank you for your questions - I will try to summarize here:
BINDetect
individually per condition, or once with all conditions included? If you are running all conditions together, the number of genomic regions will be equal, and then the conditions can be normalized to each other with quantile normalization as mentioned.If I want to study which TF has the max ft_Score in the regulatory region of a specific gene, which means the most important, and there are multiple sites for the same TF within that regulatory region, which approach is more suitable - taking the maximum value or the average value?
I think this depends on the biological question, but mostly I would recommend to take the maximum value. I think we can assume that transcription factors can have more than one possible binding site in a region, but that the one with the largest footprint score is the most likely to be bound in that condition.
As mentioned in your article, not all TFs will necessarily form a TF footprint. So, if in condition1, TF1 has a mean footprint score of 1.3, and TF2 has a mean footprint score of 1.5, Does it necessarily imply that TF2 is more robust or important or not?
If TF2 has a higher footprint score than TF1, it means that TF2 it shows more robust footprint/accessible signal, but that does not necessarily mean that it is more important. This is similarly to expression of genes, where the highest expressed genes are not necessarily the most important. For this reason, why usually only compare footprint scores per TF across conditions, and not between different TFs, as it is difficult to compare footprint scores for different TFs.
I hope these answers covered the questions!
Hi!
I have what I think is a related question as it concerns the score normalisation. I am trying to compare footprinting between conditions at specific regions of the genome (ROIs) vs genome-wide. My conditions have very different genome-wide coverage (control >> treated) with more similar and higher coverage at the ROIs.
I have tried this two ways with very different results. I always input the corrected signal bigwigs in the same bindetect run:
Option 1 seems to over-estimate the footprints of the treated sample vs control in the ROIs, which I assume is due to normalising the scores by the genome-wide quantiles. However, option 2 returns significantly fewer footprinted regions across TFs in the ROIs compared to option 1. I get >1000 sites in most instances. Do you think this is sufficient for the background correction? I was also wondering whether there is a sane way to compare without quantile normalisation in this case.
Thank you!
Hi @Allischoo,
Your first version is the recommended approach. BINDetect will generate a background distribution by randomly subsetting peaks. This could explain why you see fewer sites with option 2 since your ROI peaks may have higher scores than the full set of peaks. Regarding your concern about over-estimation, you can disable quantile normalization with --norm-off
if you think that a global normalization is not applicable in your case. However, I strongly recommend looking into the composition of your data e.g. distributions of your samples before doing so.
No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.
Hello, I appreciate your continued assistance, it has been very useful to me!!
I am working on multiple conditional data, And I want to identify which TFs are important in each condition in certain bed regions. Just similar with your research.
I've observed significant variations in the locus numbers within BINDetect results across different conditions. For instance, in//TF1_overview.txt, there are over 20,000 rows, whereas in //TF1_overview.txt, there are over 30,000 rows.
Based on this description ,I initially presumed that the locus numbers would be consistent across conditions.
Based on this, Could the TFBS with no output be a reason for F[i, i+Wf] < 0? I noticed that there are TFBS_footprints_condition_score=0 in the output.
Here comes the following questions:
I apologize for the barrage of questions, and I hope you have a wonderful Halloween!