Closed sixingchen718 closed 1 year ago
Hi Sixing,
Four days is on the longer side, but TEpeaks does take longer as it has to first find peaks and candidate regions, and then redistribute ambiguously aligned reads to these candidate regions and reassess if they become peaks. Please ensure that your BAM fies are sorted by query name (or unsorted). I would also check if it's using a lot of memory on your system (which might make it slow if it's using swap files). You could also check what output has been generated. The order of file generation is typically the chromosome bedGraphs (for unique mappers, treatment then ctrl), unique peaks (XLS and txt), candidate peaks, multi-read bedGraphs and the multi-peaks. Hopefully this will tell you where it's getting "stuck".
Thanks.
Thanks for your response, I stopped the program and re-run this program 6 days, I am sure the BAM files are unsorted, the output file is empty, just generates a numerically named intermediate folder(the size hasn't grown ), it looks like the program still running, just stuck. I Requested Memory:300GB Best.
Here is my report: The output (if any) follows:
verbose = 0 num. of threads = 40 num. of iterations = 20 output = TEpeaks IP file = /home/sixing/TF/chip/mapping/36.bam control file = /home/sixing/TF/chip/mapping/37.bam species = mm gsize = 1.87e+09 auto = false tolarge = false name = H3K9me3_WT fragment = 200 bandwidth = 300 pval = 1e-05 fdr = 0.05 lmfold = 5 hmfold = 50 pileup = 20 fe = 3 output = TEpeaks Tue Nov 8 11:30:05 2022 INFO mkdir TEpeaks Tue Nov 8 11:30:05 2022 INFO #1 read alignment files ... Tue Nov 8 11:30:05 2022 INFO #1 read tags... Tue Nov 8 11:30:05 2022 INFO start read bam file ... Tue Nov 8 11:30:05 2022 INFO open file /home/sixing/TF/chip/mapping/36.bam Tue Nov 8 11:30:06 2022 INFO 1000000 alignments. Tue Nov 8 11:30:07 2022 INFO 2000000 alignments. Tue Nov 8 11:30:08 2022 INFO 3000000 alignments. Tue Nov 8 11:30:10 2022 INFO 4000000 alignments. Tue Nov 8 11:30:11 2022 INFO 5000000 alignments. Tue Nov 8 11:30:12 2022 INFO 6000000 alignments. Tue Nov 8 11:30:13 2022 INFO 7000000 alignments. Tue Nov 8 11:30:14 2022 INFO 8000000 alignments. Tue Nov 8 11:30:16 2022 INFO 9000000 alignments. Tue Nov 8 11:30:17 2022 INFO 10000000 alignments. Tue Nov 8 11:30:18 2022 INFO 11000000 alignments. Tue Nov 8 11:30:19 2022 INFO 12000000 alignments. Tue Nov 8 11:30:20 2022 INFO 13000000 alignments. Tue Nov 8 11:30:22 2022 INFO 14000000 alignments. Tue Nov 8 11:30:23 2022 INFO 15000000 alignments. Tue Nov 8 11:30:24 2022 INFO 16000000 alignments. Tue Nov 8 11:30:25 2022 INFO 17000000 alignments. Tue Nov 8 11:30:26 2022 INFO 18000000 alignments. Tue Nov 8 11:30:28 2022 INFO 19000000 alignments. Tue Nov 8 11:30:29 2022 INFO start read bam file ... Tue Nov 8 11:30:29 2022 INFO open file /home/sixing/TF/chip/mapping/37.bam Tue Nov 8 11:30:30 2022 INFO 1000000 alignments. Tue Nov 8 11:30:31 2022 INFO 2000000 alignments. Tue Nov 8 11:30:33 2022 INFO 3000000 alignments. Tue Nov 8 11:30:34 2022 INFO 4000000 alignments. Tue Nov 8 11:30:35 2022 INFO 5000000 alignments. Tue Nov 8 11:30:36 2022 INFO 6000000 alignments. Tue Nov 8 11:30:37 2022 INFO 7000000 alignments. Tue Nov 8 11:30:39 2022 INFO 8000000 alignments. Tue Nov 8 11:30:40 2022 INFO 9000000 alignments. Tue Nov 8 11:30:41 2022 INFO 10000000 alignments. Tue Nov 8 11:30:42 2022 INFO 11000000 alignments. Tue Nov 8 11:30:43 2022 INFO 12000000 alignments. Tue Nov 8 11:30:45 2022 INFO 13000000 alignments. Tue Nov 8 11:30:45 2022 INFO #1 fragment size = 746 Tue Nov 8 11:30:45 2022 INFO #1 total fragments in treatment: 9698360 Tue Nov 8 11:30:45 2022 INFO #1 total multi fragments in treatment: 0 Tue Nov 8 11:30:45 2022 INFO #1 total unique fragments in treatment: 9698360 Tue Nov 8 11:30:45 2022 INFO #1 total fragments in control: 6442587 Tue Nov 8 11:30:45 2022 INFO #1 total multi fragments in control: 0 Tue Nov 8 11:30:45 2022 INFO #1 total unique fragments in control: 6442587 Tue Nov 8 11:30:45 2022 INFO #2 user defined maximum in treatment and control 1 Tue Nov 8 11:30:45 2022 INFO #2 filter out redundant tags/fragments ... Tue Nov 8 11:30:46 2022 INFO #2 total fragments after removing duplicates in treatment: 1 Tue Nov 8 11:30:46 2022 INFO #2 total fragments after removing duplicates in control: 1 Tue Nov 8 11:30:48 2022 INFO #3 Skipped... building peak model to estimate fragment length. Tue Nov 8 11:30:48 2022 INFO #3 Use 746 as fragment length. Tue Nov 8 11:30:48 2022 INFO #4 Call peaks... Tue Nov 8 11:30:48 2022 INFO opt.gsize = 1870000000.000000 Tue Nov 8 11:30:48 2022 INFO lambda_bg 0.000000 treat_sum = 0.000000 Tue Nov 8 11:30:48 2022 INFO #4 Call peaks with given q-value cutoff : 0.050000 Tue Nov 8 11:30:48 2022 INFO #4 Pre-compute pvalue-qvalue table... Tue Nov 8 11:30:48 2022 INFO #4 Start to calculate pvalue stat... Tue Nov 8 11:30:48 2022 DEBUG canochromlist size = 22 Tue Nov 8 11:30:48 2022 DEBUG treat_pv size = 500912 Tue Nov 8 11:30:48 2022 DEBUG scale_factor_s size = 3 Tue Nov 8 11:30:48 2022 DEBUG d = 746 Tue Nov 8 11:30:48 2022 DEBUG in pileup_a_chromosome_c Tue Nov 8 11:30:48 2022 DEBUG start se_all_in_one_pileup 875916 195471971 Tue Nov 8 11:30:48 2022 DEBUG tmp_pileup size = 1694026 Tue Nov 8 11:30:48 2022 DEBUG d = 1000 Tue Nov 8 11:30:48 2022 DEBUG in pileup_a_chromosome_c Tue Nov 8 11:30:48 2022 DEBUG start se_all_in_one_pileup 875916 195471971 Tue Nov 8 11:30:48 2022 DEBUG tmp_pileup size = 1694003 Tue Nov 8 11:30:48 2022 DEBUG prev_pileup size = 0 Tue Nov 8 11:30:48 2022 DEBUG prev_pileup size = 3367989 Tue Nov 8 11:30:48 2022 DEBUG d = 10000 Tue Nov 8 11:30:48 2022 DEBUG in pileup_a_chromosome_c Tue Nov 8 11:30:49 2022 DEBUG start se_all_in_one_pileup 875916 195471971 Tue Nov 8 11:30:49 2022 DEBUG tmp_pileup size = 1695443 Tue Nov 8 11:30:49 2022 DEBUG prev_pileup size = 0 Tue Nov 8 11:30:49 2022 DEBUG prev_pileup size = 5033222 Tue Nov 8 11:30:49 2022 DEBUG ctrl_pv size = 5033222 807726 pos ..... ..... ..... 195372330 pos 195372336 pos 195372340 pos 195372342 pos Tue Nov 8 11:31:04 2022 INFO chr_pos_treat_ctrl pos size = 5534082
Hi,
Thank you for your log output. I noticed the following lines:
Tue Nov 8 11:30:45 2022
INFO #1 total multi fragments in treatment: 0
Tue Nov 8 11:30:45 2022
INFO #1 total multi fragments in control: 0
Tue Nov 8 11:30:46 2022
INFO #2 total fragments after removing duplicates in treatment: 1
Tue Nov 8 11:30:46 2022
INFO #2 total fragments after removing duplicates in control: 1
Based on this, TEpeaks
has determined that there are no ambiguous/multi alignments in your BAM files. This would mean that there is nothing for TEpeaks
to try and resolve with EM, and thus might be why it's hanging. If that is the case, then you should just run MACS2
.
When you align, did you ensure that multi-mappers are outputted, or only 1 alignment per read?
Feel free to provide the mapping command line if you like.
Thanks.
Hi, Here is my code:
bowtie2 -p 10 -1 36_S36_R1_001_val_1.fq -2 36_S36_R2_001_val_2.fq \ -x ../../../genome/GRCm38/bowtie2_index/GRCm38 \ -S ../../mapping/36.sam
529769 pairs aligned concordantly 0 times; of these:
43440 (8.20%) aligned discordantly 1 time
----
486329 pairs aligned 0 times concordantly or discordantly; of these:
972658 mates make up the pairs; of these:
603257 (62.02%) aligned 0 times
74934 (7.70%) aligned exactly 1 time
294467 (30.27%) aligned >1 times
96.96% overall alignment rate
I already used the MACS2 to run this file and it works, the result seems OK, cause my project may associate with TE, So I am curious what the TEpeaks results. Thanks for your help. Best
Hi,
Your bowtie2 command is reporting only the best hit, but not necessarily all equally best hits.
As a result, it's only reporting one alignment per read. In order to use TEpeaks, you will need to allow multiple alignments per read (with the -k 100
parameter for example.
Thanks.
I will re-process the data as your suggestion. Thanks for your guidance.
Hi Oliver, I re-mapped the data as your suggestion, but new issues have emerged when I run the TEpeaks, The ERROR is "Missing reference sequences with id 480292640", May I ask how to resolve this issue?
here is the report.
The output (if any) follows:
verbose = 0 num. of threads = 40 num. of iterations = 20 output = TEpeaks IP file = /home/sixing/TF/chip/mapping/36.bam control file = /home/sixing/TF/chip/mapping/37.bam species = mm gsize = 1.87e+09 auto = false tolarge = false name = H3K9me3_WT fragment = 200 bandwidth = 300 pval = 1e-05 fdr = 0.05 lmfold = 5 hmfold = 50 pileup = 20 fe = 3 output = TEpeaks Mon Nov 21 10:34:53 2022 INFO mkdir TEpeaks Mon Nov 21 10:34:53 2022 INFO #1 read alignment files ... Mon Nov 21 10:34:53 2022 INFO #1 read tags... Mon Nov 21 10:34:53 2022 INFO start read bam file ... Mon Nov 21 10:34:53 2022 INFO open file /home/sixing/TF/chip/mapping/36.bam Mon Nov 21 10:34:54 2022 ERROR Missing reference sequences with id 480292640
Thanks so much.
Hi,
What is your bowtie2 command? Did you remove all unmapped reads from the BAM file? If that doesn't resolve the issue, could you provide an excerpt of the BAM file for us to troubleshoot?
Thanks.
Here is my bowtie2 command: cat bowtie2_36.sh bowtie2 -p 10 -1 36_S36_R1_001_val_1.fq -2 36_S36_R2_001_val_2.fq \ -x ../../../genome/GRCm39/bowtie2_index/GRCm39 \ -k 100 \ -S ../../mapping/36.sam
cat samtobam36.sh samtools view -@8 -b 36.sam > 36.bam
468901 pairs aligned concordantly 0 times; of these:
42423 (9.05%) aligned discordantly 1 time
----
426478 pairs aligned 0 times concordantly or discordantly; of these:
852956 mates make up the pairs; of these:
557445 (65.35%) aligned 0 times
69860 (8.19%) aligned exactly 1 time
225651 (26.46%) aligned >1 times
97.19% overall alignment rate.
I guess I haven't removed the unmapped reads, I try to reprocess the data, thanks a lot. Happy Thanksgiving.
Hi Oliver, I still haven't resolved the issue, I think I already removed the unmapped reads
"cat bamremoveunmapped36.sh samtools view -b -F 4 mm10_36.bam > mm10_36removeunmapped.bam"
here is my BAM file
here is my report: Exited with exit code 1.
Resource usage summary:
CPU time : 0.16 sec.
Max Memory : -
Average Memory : -
Total Requested Memory : 309600.00 MB
Delta Memory : -
Max Swap : -
Max Processes : -
Max Threads : -
Run time : 6 sec.
Turnaround time : 3 sec.
The output (if any) follows:
verbose = 0 num. of threads = 1 num. of iterations = 20 output = TEpeaks IP file = /home/sixing/TF/chip/mapping/mm10_36removeunmapped.bam control file = /home/sixing/TF/chip/mapping/mm10_37removeunmapped.bam species = mm gsize = 1.87e+09 auto = false tolarge = false name = H3K9me3_WT fragment = 200 bandwidth = 300 pval = 1e-05 fdr = 0.05 lmfold = 5 hmfold = 50 pileup = 20 fe = 3 output = TEpeaks Tue Nov 29 10:21:43 2022 INFO mkdir TEpeaks Tue Nov 29 10:21:43 2022 INFO #1 read alignment files ... Tue Nov 29 10:21:43 2022 INFO #1 read tags... Tue Nov 29 10:21:43 2022 INFO start read bam file ... Tue Nov 29 10:21:43 2022 INFO open file /home/sixing/TF/chip/mapping/mm10_36removeunmapped.bam Tue Nov 29 10:21:44 2022 ERROR Missing reference sequences with id -1548049344
Regards, Sixing
Hi Sixing,
It really looks like there's an error with one of the entries in your BAM file.
One other thing to try: could you run bowtie2 with --no-mixed
mode turned on? I wonder if it's an error where it's expecting a properly paired read, but only one of the read was mappable.
Otherwise, could you send me one of the BAM file? I can try to dig into it to see if one (or more) alignment is causing the issue.
Sorry for all the hassle.
Thanks.
Hi Sixing,
It really looks like there's an error with one of the entries in your BAM file. One other thing to try: could you run bowtie2 with
--no-mixed
mode turned on? I wonder if it's an error where it's expecting a properly paired read, but only one of the read was mappable. Otherwise, could you send me one of the BAM file? I can try to dig into it to see if one (or more) alignment is causing the issue. Sorry for all the hassle.Thanks.
I will re-mapped as your suggestion if still can't resolve the issue thanks for helping me check the BAM file. I greatly appreciate your patience!
Hi Oliver,
The program is stuck again, could you help me check the BAM file?
Thanks.
Hi,
Sure. Is it the same error? Let me know where I can download the BAM?
Thanks
Hi,
Sure. Is it the same error? Let me know where I can download the BAM?
Thanks
Here is the code and report, cause the bam file has 25GB, and I need to figure out how to send it to you, may I ask for your email, I will email you once I figure it out. Many thanks!
(base) cat bowtie2_36.sh bowtie2 -p 10 -1 36_S36_R1_001_val_1.fq -2 36_S36_R2_001_val_2.fq \ -x ../../../genome/mm10/bowtie2_index/mm10 \ -k 100 \ --no-mixed \ -S ../../mapping/mm10_nomixed_36.sam
(base) cat samtobam36.sh samtools view -@8 -b mm10_nomixed_36.sam > mm10_nomixed_36.bam
(base) cat bamremoveunmapped36.sh samtools view -b -F 4 mm10_nomixed_36.bam > mm10_nomixed_36removeunmapped.bam
(base) cat TEpeaks3637.sh TEpeaks narrow \ -t /home/sixing/TF/chip/mapping/mm10_nomixed_36removeunmapped.bam \ -c /home/sixing/TF/chip/mapping/mm10_nomixed_37removeunmapped.bam \ -s mm -i 20 \ -o TEpeaks -n H3K9me3_WT
Resource usage summary:
CPU time : 157111.56 sec.
Max Memory : 11311 MB
Average Memory : 10141.66 MB
Total Requested Memory : 309600.00 MB
Delta Memory : 298289.00 MB
Max Swap : -
Max Processes : 4
Max Threads : 5
Run time : 157274 sec.
Turnaround time : 157275 sec.
The output (if any) follows:
verbose = 0 num. of threads = 1 num. of iterations = 20 output = TEpeaks IP file = /home/sixing/TF/chip/mapping/mm10_nomixed_36removeunmapped.bam control file = /home/sixing/TF/chip/mapping/mm10_nomixed_37removeunmapped.bam species = mm gsize = 1.87e+09 auto = false tolarge = false name = H3K9me3_WT fragment = 200 bandwidth = 300 pval = 1e-05 fdr = 0.05 lmfold = 5 hmfold = 50 pileup = 20 fe = 3 output = TEpeaks Tue Nov 29 14:54:13 2022 INFO mkdir TEpeaks Tue Nov 29 14:54:13 2022 INFO #1 read alignment files ... Tue Nov 29 14:54:13 2022 INFO #1 read tags... Tue Nov 29 14:54:13 2022 INFO start read bam file ... Tue Nov 29 14:54:13 2022 INFO open file /home/sixing/TF/chip/mapping/mm10_nomixed_36removeunmapped.bam Tue Nov 29 14:54:14 2022 INFO 1000000 alignments. Tue Nov 29 14:54:14 2022 INFO 2000000 alignments. Tue Nov 29 14:54:15 2022 INFO 3000000 alignments. Tue Nov 29 14:54:16 2022 INFO 4000000 alignments. Tue Nov 29 14:54:16 2022 INFO 5000000 alignments. Tue Nov 29 14:54:17 2022 … … … … 195370429 pos 195370587 pos 195370695 pos 195370760 pos 195370865 pos 195370961 pos 195370983 pos 195371107 pos 195371156 pos 195371191 pos 195371364 pos Tue Nov 29 15:12:08 2022 INFO chr_pos_treat_ctrl pos size = 4257734
PS:
Read file </home/sixing/mm10nomixTEpeak_err.log> for stderr output of this job.
Hi,
It looks like it might still be running (unlike in previous instances where it just failed), but I'll be happy to take a look at the BAM file Two things that I am noticing:
My email is tam at cshl dot edu
Thanks.
That's great to know, I will run again using 40 threads. Thanks
Hi Sixing,
I have received your file, and will take a look at it. Please let it run for now, and see if it finishes eventually.
Thanks.
Got it, thanks so much.
Hi Sixing,
Just a quick update: I am able to reproduce the issue (where it's running for a long time). It's not clear what might be causing it, other than that it's a paired-end library (which we haven't tested extensively with the software).
As a temporary work-around, I might recommend aligning either one or both ends as single-end libraries (using the -k 100
parameter, and removing unmapped reads with samtools), and then using both ends for the treatment and control
E.g.
TEpeaks narrow
-t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam
-c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam
-s mm -i 20
-o TEpeaks -n H3K9me3_WT
We'll keep trying to troubleshoot this.
Thanks.
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software.
One workaround is to realign the two ends as single-end libraries, using -k 100
and removing unmapped reads. You can then try to run the following:
TEpeaks narrow
-t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam
-c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam
-s mm -i 20
-o TEpeaks -n H3K9me3_WT
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software. One workaround is to realign the two ends as single-end libraries, using -k 100 and removing unmapped reads. You can then try to run the following:
TEpeaks narrow -t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam /home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam -c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam /home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam -s mm -i 20 -o TEpeaks -n H3K9me3_WT
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software. One workaround is to realign the two ends as single-end libraries, using -k 100 and removing unmapped reads. You can then try to run the following:
TEpeaks narrow
-t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam
-c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam
-s mm -i 20
-o TEpeaks -n H3K9me3_WT
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software. One workaround is to realign the two ends as single-end libraries, using -k 100 and removing unmapped reads. You can then try to run the following:
TEpeaks narrow
-t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam
-c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam
/home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam
-s mm -i 20
-o TEpeaks -n H3K9me3_WT
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software. One workaround is to realign the two ends as single-end libraries, using -k 100 and removing unmapped reads. You can then try to run the following:
TEpeaks narrow
-t /home/sixing/TF/chip/mapping/mm10_36_R1_removeunmapped.bam /home/sixing/TF/chip/mapping/mm10_36_R2_removeunmapped.bam
-c /home/sixing/TF/chip/mapping/mm10_37_R1_removeunmapped.bam /home/sixing/TF/chip/mapping/mm10_37_R2_removeunmapped.bam
-s mm
-i 20
-o TEpeaks
-n H3K9me3_WT
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Hi Sixing,
Just an update: we are able to recapitulate the issue, but have not identified the cause. We suspect that it might be due to paired-end mapping, which we have not tested extensively with the software.
One workaround is to realign the two ends as single-end libraries, using -k 100 and removing unmapped reads. You can then run TEpeaks
using the two single-end BAM as treatment and control inputs.
Hope this gives you some results. We'll continue to troubleshoot on our end.
Thanks
Update: We can recapitulate the issue, and will into it further
Will look into it further.
Thanks.
I will re-run this program as your suggestion, really appreciate your help. Best Sixing
Hi Oliver, It's really exciting I got the result as per your suggestion. Thanks for your help. Best, Sixing
Hello,
May I ask how long I need to run the TEpeaks program, normally I run the MACS2 for just around 30min, but the TEpeaks still running after four days, I don't know do I need to continue to wait or stop the program, is that normal?
here is my code: TEpeaks narrow \ -t /home/sixing/TF/chip/mapping/36.bam /home/sixing/TF/chip/mapping/361.bam \ -c /home/sixing/TF/chip/mapping/37.bam /home/sixing/TF/chip/mapping/371.bam \ -s mm -i 20 \ -o TEpeaks -n H3K9me3_WT
look forward your response, regards, Sixing