loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
195 stars 41 forks source link

ValueError: The data contains non-finite values #284

Closed lufuhao closed 1 month ago

lufuhao commented 3 months ago

Please help with this error: ValueError: The data contains non-finite values.

Here are some details:

TOBIAS: tobias/v0.16.1-ab66886 Platform: ubuntu 22.04 server

These two peaks can be successfully run in single-mode. But when I run them in condition mode, it reported following error.

LOG:

BAM   :/path/to/hd/9.tobias/DD6s0h/DD6s0h.footprint.bw /path/to/hd/9.tobias/DD6n/DD6n.footprint.bw
Peak  : /path/to/hd/9.tobias/cond_peaks/DD6s6n.merged.parts.narrowPeak
Ref   : /path/to/genomes/iwgsc_refseqv2.1/DD.sm.part.fa
Motif : /path/to/hd/9.tobias/JASPAR2024_CORE_nr.PlantTFDB.UniPROBE.meme
Prefix: DD6sP5
Out   : /path/to/hd/9.tobias/cond_DD6sP52

TOBIAS BINDetect --signals /path/to/hd/9.tobias/DD6s0h/DD6s0h.footprint.bw /path/to/hd/9.tobias/DD6n/DD6n.footprint.bw --genome /path/to//genomes/iwgsc_refseqv2.1/DD.sm.part.fa --outdir cond_DD6sP52 --peaks /path/to/hd/9.tobias/cond_peaks/DD6s6n.merged.parts.narrowPeak --motifs /path/to/hd/9.tobias/JASPAR2024_CORE_nr.PlantTFDB.UniPROBE.meme --cond_names DD6s0h P5 --prefix  DD6sP5 --split 10

# TOBIAS 0.16.1 BINDetect (run started 2024-08-03 12:32:03.501791)
# Working directory: /home/hpcusers/admin/lufuhao/hd/9.tobias
# Command line call: TOBIAS BINDetect --signals /path/to/hd/9.tobias/DD6s0h/DD6s0h.footprint.bw /path/to/hd/9.tobias/DD6n/DD6n.footprint.bw --genome /path/to/genomes/iwgsc_refseqv2.1/DD.sm.part.fa --outdir cond_DD6sP52 --peaks /path/to/hd/9.tobias/cond_peaks/DD6s6n.merged.parts.narrowPeak --motifs /path/to/hd/9.tobias/JASPAR2024_CORE_nr.PlantTFDB.UniPROBE.meme --cond_names DD6s0h P5 --prefix DD6sP5 --split 10

# ----- Input parameters -----
# signals:      ['/path/to/hd/9.tobias/DD6s0h/DD6s0h.footprint.bw', '/path/to/hd/9.tobias/DD6n/DD6n.footprint.bw']
# peaks:        /path/to/9.tobias/cond_peaks/DD6s6n.merged.parts.narrowPeak
# motifs:       ['/path/to/hd/9.tobias/JASPAR2024_CORE_nr.PlantTFDB.UniPROBE.meme']
# genome:       /path/to/genomes/iwgsc_refseqv2.1/DD.sm.part.fa
# cond_names:   ['DD6s0h', 'P5']
# peak_header:  None
# naming:       name_id
# motif_pvalue: 0.0001
# bound_pvalue: 0.001
# pseudo:       None
# time_series:  False
# skip_excel:   False
# output_peaks: None
# norm_off:     False
# prefix:       DD6sP5
# outdir:       /path/to/hd/9.tobias/cond_DD6sP52
# cores:        1
# split:        10
# debug:        False
# verbosity:    3
...
2024-08-03 12:49:54 (519195) [INFO]     Progress 99.914%
2024-08-03 12:49:55 (519195) [INFO]     Progress 99.943%
2024-08-03 12:49:55 (519195) [INFO]     Progress 99.971%
2024-08-03 12:49:56 (519195) [INFO]     Progress 100.0%
2024-08-03 12:49:56 (519195) [INFO]     Progress done!

2024-08-03 12:49:56 (519195) [INFO]     Scanning for motifs and matching to signals...
2024-08-03 13:26:39 (519195) [INFO]     Done scanning for TFBS across regions!
2024-08-03 13:26:39 (519195) [INFO]     Waiting for bedfiles to write
2024-08-03 13:26:40 (519195) [INFO]     Merging results from subsets

2024-08-03 13:28:24 (519195) [INFO]     Normalizing scores across conditions
2024-08-03 13:28:25 (519195) [INFO]     Estimating bound/unbound threshold
2024-08-03 13:28:25 (519195) [STATS]    - Threshold estimated at: 1.50766

2024-08-03 13:28:25 (519195) [INFO]     Calculating background log2 fold-changes between conditions
2024-08-03 13:28:25 (519195) [INFO]     - DD6s0h / P5

2024-08-03 13:28:25 (519195) [INFO]     Processing scanned TFBS individually
2024-08-03 13:28:25 (519195) [INFO]     - Arnt_MA0004.1
Traceback (most recent call last):
  File "/home/hpcsoft/ProdSoft/tobias/v0.14.0-ab66886/x86_64/bin/TOBIAS", line 33, in <module>
    sys.exit(load_entry_point('tobias==0.16.1', 'console_scripts', 'TOBIAS')())
  File "/home/hpcsoft/ProdSoft/tobias/v0.14.0-ab66886/x86_64/lib/python3.10/site-packages/tobias-0.16.1-py3.10-linux-x86_64.egg/tobias/TOBIAS.py", line 162, in main
    args.func(args)             
  File "/home/hpcsoft/ProdSoft/tobias/v0.14.0-ab66886/x86_64/lib/python3.10/site-packages/tobias-0.16.1-py3.10-linux-x86_64.egg/tobias/tools/bindetect.py", line 668, in run_bindetect
    results.append(process_tfbs(name, args, log2fc_params))
  File "/home/hpcsoft/ProdSoft/tobias/v0.14.0-ab66886/x86_64/lib/python3.10/site-packages/tobias-0.16.1-py3.10-linux-x86_64.egg/tobias/tools/bindetect_functions.py", line 512, in process_tfbs
    obs_params = diff_dist.fit(observed_log2fcs)
  File "/home/hpcsoft/TestSoft/libraries/python_modules/lib/python3.10/site-packages/scipy/stats/_continuous_distns.py", line 66, in wrapper
    return fun(self, *args, **kwds)
  File "/home/hpcsoft/TestSoft/libraries/python_modules/lib/python3.10/site-packages/scipy/stats/_continuous_distns.py", line 408, in fit
    raise ValueError("The data contains non-finite values.")
ValueError: The data contains non-finite values.
2024-08-03 13:28:27 (519275) [ERROR]    Multiprocessing logger lost connection to queue - probably due to an error raised from a child process.

Thank you so much

hschult commented 3 months ago

Hi @lufuhao,

the bigwig files run separately but not together because TOBIAS skips the differential part of the algorithm as it requires at least two conditions. For the error, as far as I can tell the problem occurs when BINDetect tries to compute the differential scoring but at least one of the conditions contains NA or infinite which makes this impossible, hence the error. Since this may have to do with how you created your bigwig files I recommend revisiting the ScoreBigwig step and making sure that --regions uses the same file for all of your conditions. If you are going for a full ATAC analysis I can recommend our snakemake pipeline. It strings the separate TOBIAS tools together so you don't have to worry about running each of the tools yourself. To avoid additional issues I would further suggest updating your TOBIAS version we recently released version 0.17.0.

I hope this fixes your issue. And feel free to update here. Best wishes Hendrik

github-actions[bot] commented 1 month ago

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.