loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
191 stars 41 forks source link

BINDetect error when using more than 2 bigwigs #100

Closed helenhuangmath closed 2 years ago

helenhuangmath commented 2 years ago

Hi, Thank you for developing this useful tool! I did a test run using the testing data downloaded from TOBIAS Snakemake pipeline, but with my peak file (I didn't find the peak file you used in tutorial, so just random picked a T cell peak list).

Q1. There's an error at 3rd step BINDetect if I use more than 2 bigwigs for --signals. Here's the error:

...
2021-11-17 22:54:31 (124058) [INFO] Calculating background log2 fold-changes between conditions
2021-11-17 22:54:31 (124058) [INFO] - Footprints_Bcell / Footprints_Tcell

2021-11-17 22:54:31 (124058) [INFO] Processing scanned TFBS individually
2021-11-17 22:54:31 (124058) [INFO] Progress 0%
2021-11-17 22:54:31 (124058) [INFO] Progress 100.0%
2021-11-17 22:54:31 (124058) [INFO] Progress done!
2021-11-17 22:54:31 (124058) [INFO] Concatenating results from subsets

2021-11-17 22:54:32 (124058) [INFO] Writing all_bindetect files
2021-11-17 22:54:32 (124058) [INFO] Creating BINDetect plot(s)
2021-11-17 22:54:32 (124058) [INFO] - Footprints_Bcell / Footprints_Tcell
Traceback (most recent call last):
  File "/home/hhua/miniconda3/bin/TOBIAS", line 11, in <module>
    load_entry_point('tobias==0.8.0', 'console_scripts', 'TOBIAS')()
  File "/home/hhua/miniconda3/lib/python3.7/site-packages/tobias/TOBIAS.py", line 152, in main
    args.func(args)     #run specified function with arguments
  File "/home/hhua/miniconda3/lib/python3.7/site-packages/tobias/footprinting/bindetect.py", line 649, in run_bindetect
    fig = plot_bindetect(comparison_motifs, clustering, [cond1, cond2], args)
  File "/home/hhua/miniconda3/lib/python3.7/site-packages/tobias/footprinting/bindetect_functions.py", line 571, in plot_bindetect
    dendro_dat = dendrogram(cluster_obj.linkage_mat, labels=IDS, no_labels=True, orientation="right", ax=ax3, above_threshold_color="black", link_color_func=lambda k: cluster_obj.node_color[k])
  File "/home/hhua/miniconda3/lib/python3.7/site-packages/scipy/cluster/hierarchy.py", line 3278, in dendrogram
    raise ValueError("Dimensions of Z and labels must be consistent.")
ValueError: Dimensions of Z and labels must be consistent.

Q2. When should I use --normalize in PlotAggregate and when shouldn't?

Below is my test run code. Thank you so much for your help!


##--------------------------------------------------------------------------------------------------------

refgenome=TOBIAS_snakemake/data/genome_chr4.fa.gz
Blackls=TOBIAS_snakemake/data/blacklist_chr4.bed
PeakUse=UnionALL
Peak=test_peak1.bed

## 1. ATACorrect: 

SampleID=Bcell
BAM=TOBIAS_snakemake/data/Bcell_chr4.bam

TOBIAS ATACorrect \
--bam ${BAM} \
--genome ${refgenome} \
--peaks ${Peak} \
--blacklist ${Blackls} \
--outdir out_1_ATACorrect_byPeak_${PeakUse} \
--cores 8

## 2. ScoreBigwig: 

BW_Correct=out_1_ATACorrect_byPeak_${PeakUse}/${SampleID}_chr4_corrected.bw

TOBIAS FootprintScores \
--signal ${BW_Correct} \
--regions ${Peak} \
--output Footprints_${SampleID}.bw \
--cores 8

## 3. BINDetect: 
BW=(Footprints_Bcell.bw Footprints_Tcell.bw)
echo ${BW[@]}

TF=BATFJUN
TestMotif=TOBIAS_snakemake/data/individual_motifs/MA0462.1.pfm

TOBIAS BINDetect \
--motifs ${TestMotif} \
--signals ${BW[@]} \
--genome ${refgenome} \
--peaks ${Peak} \
--outdir out_2_BINDetect_byPeak_${PeakUse} \
--prefix bindetect_${TF} \
--cores 8

## 4. PlotAggregate: 

TFUse=BATFJUN_MA0462.1
TFBSPeak=out_2_BINDetect_byPeak_${PeakUse}/${TFUse}/beds/${TFUse}_all.bed
Bound1=out_2_BINDetect_byPeak_${PeakUse}/${TFUse}/beds/${TFUse}_Footprints_BCell_bound.bed
Bound2=out_2_BINDetect_byPeak_${PeakUse}/${TFUse}/beds/${TFUse}_Footprints_TCell_bound.bed

TOBIAS PlotAggregate \
--TFBS ${TFBSPeak} ${Bound1} ${Bound2} \
--signals out_1_ATACorrect_byPeak_${PeakUse}/*corrected.bw \
--blacklist ${Blackls} \
--output plot_footprint_byPeak_${PeakUse}_${TFUse}.pdf \
--title ${TFUse} \
--flank 100 \
--normalize \
--share_y both --plot_boundaries 

##--------------------------------------------------------------------------------------------------------
msbentsen commented 2 years ago

Hi, thank you for your issue - I will try to answer below:

Q1: Can I ask what version of TOBIAS you are using? You can get it with TOBIAS --version. The error seems to arise because there is only one motif, but I thought I had already solved that issue... but I will also have a look again.

In general, BINDetect can run with more than one motif, so you don't need to run it only for "BATFJUN", but you can give a whole motif file. You can download test data using TOBIAS DownloadData --bucket data-tobias-2020 as explained here: https://github.com/loosolab/TOBIAS/wiki/test-data. This is also the data used in the examples here, so you can have a look at these files to see the format.

Q2: You could use --normalize if you are comparing the depth of footprints between different transcription factors. Without normalize, they will have different ranges (maybe TF1 goes from [-0.1;0.2] and TF2 goes from [-1;4]) and that will make it difficult to compare the raw footprint depths, which are written out by PlotAggregate. So there --normalize will be helpful. Otherwise, you generally do not need to use it.

Hope that was helpful!

helenhuangmath commented 2 years ago

Thank you for your reply, it's helpful!

Per Q1. you are right, I used version 0.8, after updating to version 0.12.10, I can run without any errors.