macs3-project / MACS

MACS -- Model-based Analysis of ChIP-Seq
https://macs3-project.github.io/MACS/
BSD 3-Clause "New" or "Revised" License
708 stars 268 forks source link

Bug: HMMRATAC ends in ZeroDivisionError: float division #531

Open lionaid opened 2 years ago

lionaid commented 2 years ago

Describe the bug hmmratac crashes with ZeroDivisionError: float division

Execution of only one of my sample-sets (control condition aka static in this case) leads to a crash of MACS3. All other samples run fine.

Command is: MACS3 hmmratac -b replicate1.bam replicate2.bam replicate3.bam --outdir static.pooled --save-digested --save-states

Last output including error message is:

INFO  @ Mon, 24 Oct 2022 15:27:44:  12000000 fragments parsed
INFO  @ Mon, 24 Oct 2022 15:27:45: 12252543 fragments have been read.
INFO  @ Mon, 24 Oct 2022 15:28:17: #2 Use EM algorithm to estimate means and stddevs of fragment lengths
INFO  @ Mon, 24 Oct 2022 15:28:17: #  for mono-, di-, and tri-nucleosomal signals...
INFO  @ Mon, 24 Oct 2022 15:28:17: # A random seed 10151 has been used in the sampling function
INFO  @ Mon, 24 Oct 2022 15:28:21: # Downsampled 1498990 fragments will be used for EM training...
INFO  @ Mon, 24 Oct 2022 15:33:36: # Reached convergence after 11 iterations
INFO  @ Mon, 24 Oct 2022 15:33:36: #  The means and stddevs after EM:
INFO  @ Mon, 24 Oct 2022 15:33:36: #                         short       mono         di        tri
INFO  @ Mon, 24 Oct 2022 15:33:36: #             means:         50      187.7        383      601.7
INFO  @ Mon, 24 Oct 2022 15:33:36: #           stddevs:         20       53.3       44.9       98.8
INFO  @ Mon, 24 Oct 2022 15:33:36: #  Pile up all fragments
INFO  @ Mon, 24 Oct 2022 15:34:08: #  Convert pileup to fold-change over average signal
INFO  @ Mon, 24 Oct 2022 15:34:15: #  Compute the weights for each fragment length for each of the four signal types
Traceback (most recent call last):
  File "/home/raaz/miniconda3/envs/lr-MACS3/bin/macs3", line 4, in <module>
    __import__('pkg_resources').run_script('MACS3==3.0.0b1', 'macs3')
  File "/home/raaz/miniconda3/envs/lr-MACS3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/raaz/miniconda3/envs/lr-MACS3/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1472, in run_script
    exec(code, namespace, namespace)
  File "/home/raaz/miniconda3/envs/lr-MACS3/lib/python3.10/site-packages/MACS3-3.0.0b1-py3.10-linux-x86_64.egg/EGG-INFO/scripts/macs3", line 861, in <module>
    main()
  File "/home/raaz/miniconda3/envs/lr-MACS3/lib/python3.10/site-packages/MACS3-3.0.0b1-py3.10-linux-x86_64.egg/EGG-INFO/scripts/macs3", line 101, in main
    run( args )
  File "/home/raaz/miniconda3/envs/lr-MACS3/lib/python3.10/site-packages/MACS3-3.0.0b1-py3.10-linux-x86_64.egg/MACS3/Commands/hmmratac_cmd.py", line 151, in run
    weight_mapping = generate_weight_mapping( fl_list, em_means, em_stddevs )
  File "MACS3/Signal/HMMR_Signal_Processing.pyx", line 56, in MACS3.Signal.HMMR_Signal_Processing.generate_weight_mapping
  File "MACS3/Signal/HMMR_Signal_Processing.pyx", line 87, in MACS3.Signal.HMMR_Signal_Processing.generate_weight_mapping
ZeroDivisionError: float division

The exact command with paths is: MACS3 hmmratac -b ./input/static/mpimg_L26666-1_LR2223-ATAC-LR2223-S15_S29_R1_001.trim.srt.nodup.no_chrM_MT.bam ./input/static/mpimg_L26666-1_LR2223-ATAC-LR2223-S39_S30_R1_001.trim.srt.nodup.no_chrM_MT.bam ./input/static/mpimg_L 26666-1_LR2223-ATAC-LR2223-S86_S31_R1_001.trim.srt.nodup.no_chrM_MT.bam --outdir static.pooled --save-digested --save-states

To Reproduce Run this sample-set. I have no clue how it differs from the other three conditions nor the second control set I was running successfully before that.

Expected behavior I would expect macs3 to finish the run successfully

System (please complete the following information):

Additional context Since the files are quite big please let me know in case it seems to be necessary to upload them.

zyaiqy commented 1 year ago

I also encountered the same error message, is the problem solved?

jeffhsu3 commented 1 year ago

Have a similar issue getting a zero division error, but not quite the same:

INFO  @ Tue, 29 Nov 2022 11:03:33: #2 Use EM algorithm to estimate means and stddevs of fragment lengths 
INFO  @ Tue, 29 Nov 2022 11:03:33: #  for mono-, di-, and tri-nucleosomal signals... 
INFO  @ Tue, 29 Nov 2022 11:03:33: # A random seed 10151 has been used in the sampling function 
INFO  @ Tue, 29 Nov 2022 11:03:36: # Downsampled 0 fragments will be used for EM training... 
Traceback (most recent call last):
  File "/home/jeff/mambaforge/envs/align/bin/macs3", line 861, in <module>
    main()
  File "/home/jeff/mambaforge/envs/align/bin/macs3", line 101, in main
    run( args )
  File "/home/jeff/mambaforge/envs/align/lib/python3.10/site-packages/MACS3/Commands/hmmratac_cmd.py", line 119, in run
    em_trainer = HMMR_EM( petrack, options.em_means[1:4], options.em_stddevs[1:4], seed = options.hmm_randomSeed )
  File "MACS3/Signal/HMMR_EM.pyx", line 160, in MACS3.Signal.HMMR_EM.HMMR_EM.__init__
ZeroDivisionError: float division
philippadoherty commented 1 year ago

Hi @lionaid, @zyaiqy, @jeffhsu3, thanks for raising this issue. This HMMRATAC command is still in beta testing. Can anyone please provide some testing data so we can reproduce the error on our end?

Thank you

spirpinias commented 1 year ago

Has anyone ever successfully ran code for hmmratac? If so, can you please provide a screenshot. I am consistently returning an index out of range error message. No idea what's wrong. Used 1 bam file and left everything else on default. Still this error.

StevenKrzysztof commented 1 year ago

Has this issue been solved? I've been using macs3 3.0.0b2 but still encounter that. I'm using this data from ENCODE: ENCFF346MIJ. You can get to the file by https://www.encodeproject.org/files/ENCFF346MIJ/ @philippadoherty Thank you!

philippadoherty commented 1 year ago

Hi @StevenKrzysztof, did you try using the --cutoff-analysis-only function that was added with the macs3 3.0.0b2 version? It can help to decide what -l, -u, and -c cutoffs to use. Also referenced here: #559

Thanks!

StevenKrzysztof commented 1 year ago

Hi @philippadoherty , I tried this command: macs3 hmmratac -b data/ATACseq.bam --cutoff-analysis-only but still got zero division error

philippadoherty commented 1 year ago

Thanks @StevenKrzysztof, can you provide any additional info like your log/output for example?

StevenKrzysztof commented 1 year ago

Sure @philippadoherty . Here's the log file when I run macs3_ATAC.log Thanks

taoliu commented 1 year ago

@StevenKrzysztof It seems that there is an abnormally large 'fragment length' in the data can't be assigned to either short fragment, mono-nucleosome, di-nucleosome, or tri-nucleosome. I think the best way is to exclude such large fragments from the following steps in HMMRATAC. Let me fix this before I release the next version. Thanks for the information!

StevenKrzysztof commented 1 year ago

@taoliu Thanks so much for your reply!