mhammell-laboratory / TEpeaks

Package for including repetitive regions in peak calling from ChIP-seq datasets
GNU General Public License v3.0
7 stars 4 forks source link

0 peaks in the output #8

Open natashaklmnk opened 1 year ago

natashaklmnk commented 1 year ago

Dear TEpeaks authors! Thank you for such an important tool. Unfortunately, I ran into difficulties using it. When I run the program, I get 0 peaks in the output, despite the fact that when processing simply with macs2, hundreds of peaks are obtained. Could you please tell what may be a reason. The program is called as follows:

./TEpeaks narrow -t b1.bam -c c1.bam -o test_tepeaks_rat -n test --threads 10 -p 0.00001 -s dm

Bam files are not sorted and are obtained using bowtie2 with -k 99 option. In the logs, I see a lot of messages: "INFO a read failed". The mapping summary at looks like this:

    INFO    #1 tag size = 49
Tue Mar 21 20:12:50 2023
    INFO    #1 total tags in treatment: 20920409
Tue Mar 21 20:12:50 2023
    INFO    #1 total multi tags in treatment: 5393806
Tue Mar 21 20:12:50 2023
    INFO    #1 total unique tags in treatment: 15526603
Tue Mar 21 20:12:50 2023
    INFO    #1 total tags in control: 22408126
Tue Mar 21 20:12:50 2023
    INFO    #1 total multi tags in control: 5477154
Tue Mar 21 20:12:50 2023
    INFO    #1 total unique tags in control: 16930972
Tue Mar 21 20:12:50 2023
    INFO    #2 user defined maximum in treatment and control 1
Tue Mar 21 20:12:50 2023
    INFO    #2 filter out redundant tags/fragments ...
Tue Mar 21 20:29:37 2023
    INFO    #2  total tags after removing duplicates in treatment: 17093493
Tue Mar 21 20:29:37 2023
    INFO    #2  total tags after removing duplicates in control: 16250788

Then, during peak calling step there are a lot of messages "INFO num_chroms per thread 0". At the and i see:

    INFO    candidate peak file name test_tepeaks_rat/CG1603_candidatePeaks.txt
Tue Mar 21 20:29:45 2023
    INFO    candidate peaks size 0
Tue Mar 21 20:29:45 2023
    INFO    #4 Call peaks for repetitive regions...
Tue Mar 21 20:29:45 2023
    INFO    Read in candidate peaks ... test_tepeaks_rat/CG1603_candidatePeaks.txt
Warning: No peaks found in peak file.

Thank you in advance!

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. One thing that I've noticed is that you're indicating dm for the -s (species) tag. If I'm not mistaken, that typically stands for Drosophila melanogaster. Is that the organism that you're working on (as the folder name had rat, which confused me)?

Instead of -s, you might want to do -g and provide the effective genome size for your genome build, as I don't think the drosophila genome size was preset in this version of the software. You can find the instructions on getting this information from here. Since you're including multimappers, you can use option 1 in there. If you're using dm6, it is 142,573,024, or 1.4e8

Also, could you check what the output for test_uniqpeaks.txt, test_candidate_peaks.txt and test_multiRead_peaks.txt. If all of them are empty, then it would suggest that the initial peak-calling failed, but if only the latter two failed, then it suggests that the multi-mappers were not parsed correctly.

Would you mind also providing your MACS2 command line?

Hope this is helpful.

Thanks.

natashaklmnk commented 1 year ago

Thank you very much for the quick reply! Yes, indeed, i'm using dm6 genome ("rat" indicates antibodies source, sorry for this confusion).

That's amazing! Really, changing '-s dm' to '-g 1.4e8' did help, thank you so much!

With "-s dm" in my output folder there were two files: candidatePeaks.txt and uniqpeaks.xls, both were empty. Now with "-g 1.4e8" there are a lot of non-empty files among which multiRead_peaks.txt.

My MACS2 command was: macs2 callpeak -t b1.bam -c c1.bam --name=lala --outdir=macs2_res --nomodel --extsize 200 --format=BAM --gsize=dm --tsize=49 --pvalue=0.00001

Thank you very much!

olivertam commented 1 year ago

Hi,

Thank you for the feedback. We probably need to add a warning for the -s tag if an invalid value (i.e. not preset) is passed to it. Based on what you have described, it appears that the program might have completed successfully. Please let us know if you have any additional issues or questions.

Thanks.

natashaklmnk commented 1 year ago

Thank you!