maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
381 stars 85 forks source link

in chip-seq paired mode is not used for MACS2 #949

Closed sunta3iouxos closed 6 months ago

sunta3iouxos commented 11 months ago

I dug a bit into the logs and I noticed that even if the data are paired-end MACS2 is not in paired end mode. I am feeding the --peakCallerOptions "-f BAMPE"

katsikora commented 11 months ago

This is a bit unexpected. Did you look under the MACS2/logs folder for the .err.BAMPE log files? Or are there none?

In the ChIP-seq workfow, the rule "MACS2" should execute one instance of MACS2 in single-end mode (-f BAM) and one instance in paired-end mode (-f BAMPE). By feeding this option through peakCallerOptions, you might be clashing into what snakePipes passes to MACS2 explicitly through rule parameters.

sunta3iouxos commented 11 months ago

you are correct. I see two outputs, but CSAW does not use the BAMPE output, see last one:

 callpeak -t split_bam/A006850324_209980_S12_L000_host.bam -f BAM --mfold 0 50 -g 2652783500 --nomodel --extsize 277 --keep-dup all --outdir MACS2 --name A006850324_209980_S12_L000_host.BAM --qvalue 0.001
callpeak -t split_bam/A006850324_209980_S12_L000_host.bam -f BAMPE -g 2652783500 --keep-dup all --outdir MACS2 --name A006850324_209980_S12_L000_host.BAMPE  --qvalue 0.001

and here i sth e CSAW command:

rule CSAW:
    input: MACS2/A006850324_209957_S1_L000_host.BAM_peaks.xls, MACS2/A006850324_209960_S2_L000_host.BAM_peaks.xls, MACS2/A006850324_209962_S3_L000_host.BAM_peaks.xls, MACS2/A006850324_209964_S4_L000_host.BAM_peaks.xls, MACS2/A006850324_209966_S5_L000_host.BAM_peaks.xls, MACS2/A006850324_209968_S6_L000_host.BAM_peaks.xls, MACS2/A006850324_209970_S7_L000_host.BAM_peaks.xls, MACS2/A006850324_209972_S8_L000_host.BAM_peaks.xls, MACS2/A006850324_209974_S9_L000_host.BAM_peaks.xls, MACS2/A006850324_209976_S10_L000_host.BAM_peaks.xls, MACS2/A006850324_209978_S11_L000_host.BAM_peaks.xls, MACS2/A006850324_209980_S12_L000_host.BAM_peaks.xls, MACS2/A006850324_209982_S13_L000_host.BAM_peaks.xls, MACS2/A006850324_209984_S14_L000_host.BAM_peaks.xls, MACS2/A006850324_209986_S15_L000_host.BAM_peaks.xls, MACS2/A006850324_209988_S16_L000_host.BAM_peaks.xls, MACS2/A006850324_209990_S17_L000_host.BAM_peaks.xls, MACS2/A006850324_209992_S18_L000_host.BAM_peaks.xls, /scratch/tgeorgom/AP04/pSer5POLII.tsv, split_deepTools_qc/bamPEFragmentSize/host.fragmentSize.metric.tsv, split_deepTools_qc/multiBamSummary/spikein.ChIP.scaling_factors.txt
    output: CSAW_MACS2_pSer5POLII/CSAW.session_info.txt, CSAW_MACS2_pSer5POLII/DiffBinding_analysis.Rdata, CSAW_MACS2_pSer5POLII/Filtered.results.UP.bed, CSAW_MACS2_pSer5POLII/Filtered.results.DOWN.bed, CSAW_MACS2_pSer5POLII/Filtered.results.MIXED.bed
    log: /scratch/tgeorgom/AP04/CSAW_MACS2_pSer5POLII/logs/CSAW.out, /scratch/tgeorgom/AP04/CSAW_MACS2_pSer5POLII/logs/CSAW.err
    jobid: 98
    benchmark: CSAW_MACS2_pSer5POLII/.benchmark/CSAW.benchmark
    reason: Missing output files: CSAW_MACS2_pSer5POLII/Filtered.results.MIXED.bed, CSAW_MACS2_pSer5POLII/Filtered.results.DOWN.bed, CSAW_MACS2_pSer5POLII/Filtered.results.UP.bed, CSAW_MACS2_pSer5POLII/CSAW.session_info.txt; Input files updated by another job: MACS2/A006850324_209962_S3_L000_host.BAM_peaks.xls, MACS2/A006850324_209992_S18_L000_host.BAM_peaks.xls, MACS2/A006850324_209974_S9_L000_host.BAM_peaks.xls, MACS2/A006850324_209966_S5_L000_host.BAM_peaks.xls, MACS2/A006850324_209976_S10_L000_host.BAM_peaks.xls, MACS2/A006850324_209957_S1_L000_host.BAM_peaks.xls, MACS2/A006850324_209970_S7_L000_host.BAM_peaks.xls, MACS2/A006850324_209960_S2_L000_host.BAM_peaks.xls, MACS2/A006850324_209984_S14_L000_host.BAM_peaks.xls, MACS2/A006850324_209988_S16_L000_host.BAM_peaks.xls, MACS2/A006850324_209978_S11_L000_host.BAM_peaks.xls, MACS2/A006850324_209964_S4_L000_host.BAM_peaks.xls, MACS2/A006850324_209968_S6_L000_host.BAM_peaks.xls, MACS2/A006850324_209982_S13_L000_host.BAM_peaks.xls, MACS2/A006850324_209986_S15_L000_host.BAM_peaks.xls, MACS2/A006850324_209980_S12_L000_host.BAM_peaks.xls, MACS2/A006850324_209972_S8_L000_host.BAM_peaks.xls, MACS2/A006850324_209990_S17_L000_host.BAM_peaks.xls
    resources: mem_mb=1000, disk_mb=1000, tmpdir=<TBD>
katsikora commented 11 months ago

That's right, this is hardcoded. Back in the day, the decision was made to use the single-end-MACS2-mode peaks as input to CSAW. Are you concerned about this?

sunta3iouxos commented 11 months ago

That's right, this is hardcoded. Back in the day, the decision was made to use the single-end-MACS2-mode peaks as input to CSAW. Are you concerned about this?

I would like to use the paired end BAMPE for the downstream analysis. The authors of MACS2 claim that this is a better approach to calculate the shift size.

sunta3iouxos commented 6 months ago

If I am not mistaken this is implemented.

NixBio commented 6 months ago

Thank you for your message. I am out of office till April 7th. I will answer your email once I am back in my office.

In urgent cases, please, contact ngs-core[at]lit.eu

Kind Regards, Nicholas Strieder

-- Dr. rer. nat. Nicholas Strieder ~~

Leibniz-Institut für Immuntherapie - LIT NGS Core - Bininformatics Universitätsklinikum Regensburg Franz-Josef-Strauß-Allee 11 93053 Regensburg Germany

Phone: ++49 (0)941 944 18188 E-mail: @.***

sunta3iouxos @.***> 2.4.24 16:44 >>>

If I am not mistaken this is implemented.

-- Reply to this email directly or view it on GitHub: https://github.com/maxplanck-ie/snakepipes/issues/949#issuecomment-2032234995 You are receiving this because you are subscribed to this thread.

Message ID: @.***>