Closed alexyfyf closed 3 years ago
Hi @alexyfyf,
Yes, I agree with your observation - because the changes in accessibility were so much bigger than the changes in footprint, the accessibility ended up driving the score. Biologically, there is more evidence for a TF correlating with accessibility than it always making a footprint, so I would probably trust the accessibility more. This is also because the tall-signals seen (e.g. in 6mL) can be an effect of protein binding, although it is not a canonical "footprint". But depending on the TF (repressor/activator?) and your hypothesis, the interpretation might be different!
My only question is: Did you run ATACorrect
with the same peak-set for all three conditions? (6wW, 6wL and 6mL). Otherwise, that can lead to differences in the normalization between conditions.
I am sorry that I can't give you a more definitive answer, but there are unfortunately still a lot of things still unclear in footprinting - and interpretations like this is one them!
Hi @msbentsen , thank you for developing this tool and your reply.
I've checked my analysis, and I found that I used corresponding peak files for each bam file when running ATACorrect
and FootprintScores
, but used a consensus peak set when running BINDetect
. Is that an issue?
So what do you suggest?
This is the script for ATACorrect and FootprintScores
BASE=$(basename $1 .final.bam)
PEAK=$(echo $1 | sed 's/\.final\.bam/_peaks\.narrowPeak/')
echo $BASE
echo $PEAK
TOBIAS ATACorrect --bam $1 --genome ../../ref/mmu/ucsc_mm10/mm10.fa --peaks $PEAK --outdir $BASE --cores 12
TOBIAS FootprintScores --signal ${BASE}/${BASE}.final_corrected.bw --regions $PEAK --output ${BASE}/${BASE}_footprints.bw --cores 12
This is the code for BINDetect
TOBIAS BINDetect --motifs JASPAR2020_CORE_vertebrates_non-redundant_pfms_jaspar.txt --signals 6wW/6wW_footprints.bw 6wL/6wL_footprints.bw 6mL/6mL_footprints.bw --genome ../../ref/mmu/ucsc_mm10/mm10.fa --peaks ../atac_diff/output/consensus_peaks.bed --outdir BINDetect_output --time-series --cores 12 --cond-names WE preLSC LSC
I've also tried to make a plot from deeptools, using the corrected bw files and {TF}_all.bed for example for this GATA1
computeMatrix scale-regions -R ../BINDetect_output/GATA1_MA0035.4/beds/GATA1_MA0035.4_all.bed -S ../6wW/6wW.final_corrected.bw ../6wL/6wL.final_corrected.bw ../6mL/6mL.final_corrected.bw -o out.mat.gz -b 100 -a 100 -bs 2 -p 12 --skipZeros
plotProfile -m out.mat.gz -out test1.png
And it looks like the accessibility around the motifs is not very different. But it looks different compare to the output from plotAggregate
from TOBIAS.
I've checked my analysis, and I found that I used corresponding peak files for each bam file when running
ATACorrect
andFootprintScores
, but used a consensus peak set when runningBINDetect
. Is that an issue? So what do you suggest?
Hi, I suggest to use a merge of all condition peaks as input for the ATACorrect
and FootprintScores
- sorry that this was not more clear! In your case, that would be the merge of your .narrowPeak files. The reason is that you need to have values across the whole .bed-space for comparing in the BINDetect-run. Otherwise, you will be comparing with "0" for differential peaks (rather than the true value), and this can skew the normalization. This might be what has happened here, so I hope using the same .bed-file will solve the issue.
Is ../atac_diff/output/consensus_peaks.bed
a merge of all or a file with common peaks? In any case, it is best if --bed
/--peaks
are the same across all runs.
I've checked my analysis, and I found that I used corresponding peak files for each bam file when running
ATACorrect
andFootprintScores
, but used a consensus peak set when runningBINDetect
. Is that an issue? So what do you suggest?Hi, I suggest to use a merge of all condition peaks as input for the
ATACorrect
andFootprintScores
- sorry that this was not more clear! In your case, that would be the merge of your .narrowPeak files. The reason is that you need to have values across the whole .bed-space for comparing in the BINDetect-run. Otherwise, you will be comparing with "0" for differential peaks (rather than the true value), and this can skew the normalization. This might be what has happened here, so I hope using the same .bed-file will solve the issue.Is
../atac_diff/output/consensus_peaks.bed
a merge of all or a file with common peaks? In any case, it is best if--bed
/--peaks
are the same across all runs.
Hi, thanks for the explanation, and the consensus_peaks.bed
is kind of common peaks. So I assume you suggest to do something similar to cat *.narrowPeaks | sort -k1,1 -k2,2n | bedtools merge
, which will give you cut counts in all regions covered in any condition.
Hi, thanks for the explanation, and the
consensus_peaks.bed
is kind of common peaks. So I assume you suggest to do something similar tocat *.narrowPeaks | sort -k1,1 -k2,2n | bedtools merge
, which will give you cut counts in all regions covered in any condition.
Yes, this is exactly what I mean - I use the exact same command. In that way, you are sure that scores for all regions of interest (common and differential in both directions) are available in the files.
Hi, thanks for the explanation, and the
consensus_peaks.bed
is kind of common peaks. So I assume you suggest to do something similar tocat *.narrowPeaks | sort -k1,1 -k2,2n | bedtools merge
, which will give you cut counts in all regions covered in any condition.Yes, this is exactly what I mean - I use the exact same command. In that way, you are sure that scores for all regions of interest (common and differential in both directions) are available in the files.
Sure, I'll follow your advice and rerun it. Thank you so much! I'll update if any more question if that's ok?
You're welcome, and yes sure! I will close this issue for now, but you can always reopen if you have any more questions.
Hi team,
I've tried your method to predict footprint scores (which I think is the sum of accessibility and footprint depth). The result is hard to interpret though. For example, this GATA1 is predicted to be more active along time from the volcano plot.
But when I visualize it, you can see clearly it is driven by accessibility, and the effect of footprint essentially was masked.
And if I don't scale for y, it looks like this, and the depth is larger in the middle state.
Could you suggest how to interpret this properly?
Thank you!