haddocking / haddock3

Official repo of the modular BioExcel version of HADDOCK
https://www.bonvinlab.org/haddock3
Apache License 2.0
101 stars 33 forks source link

Refactor workflow initialization to remove hard dependency on `topoaa` #921

Open VGPReys opened 3 months ago

VGPReys commented 3 months ago

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines and that you comply with the following criteria:


This PR try to find a path to be able to do a haddock3 run without having to start with topoaa.

topoaa was hard coded in the prepare_run.py, with the deep intuition that any haddock3 runs would always start from it. Now, it is no more the case.

Input molecules (in the global parameter molecules = [...]) are now stored in run_dir/data/0_NameOfTheFirstModule. Basically, now the input molecules are handled by the ModuleIO class (in haddock/libs/libontology.py), that mimic the output of an io.json. If it is the first module, input files are converted to Molecules (in haddock/libs/libontology.py), that manage them to potentially split the ensemble and return them as a dict[int, PDBFile].

Small modifications had to be applied to the topoaa module to fit this new behavior. Same to the haddock3-score and [alascan] modules, because the copy of input files must now be stored at a proper location.

Closes #932

mgiulini commented 2 months ago

I tested the PR: it works well when the workflow is made of non-CNS modules, but when CNS modules are included a workflow without topoaa badly fails at the CNS preparation steps, without catching the error. Here an example output of contmap-test removing topoaa:

(haddock3) UU-CW4VKWDG2H:analysis Giuli003$ haddock3 contmap-test.cfg 
[2024-07-26 15:48:10,564 cli INFO] 
##############################################
#                                            #
#                 HADDOCK 3                  #
#                                            #
##############################################

Starting HADDOCK 3.0.0 on 2024-07-26 15:48:00

Python 3.9.18 (main, Sep 11 2023, 08:25:10) 
[Clang 14.0.6 ]

[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 0_rigidbody
[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 1_clustfcc
[2024-07-26 15:48:14,905 libworkflow INFO] Reading instructions step 2_contactmap
[2024-07-26 15:48:15,343 base_cns_module INFO] Running [rigidbody] module
[2024-07-26 15:48:15,344 __init__ INFO] [rigidbody] crossdock=true
[2024-07-26 15:48:15,344 __init__ INFO] [rigidbody] Preparing jobs...
[2024-07-26 15:48:15,344 libutil INFO] Selected 5 cores to process 20 jobs, with 8 maximum available cores.
[2024-07-26 15:48:15,354 libparallel INFO] Using 5 cores
Process Worker-1:
Process Worker-2:
Process Worker-3:
Process Worker-4:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/Giuli003/anaconda3/envs/haddock3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libparallel.py", line 88, in run
    r = task.run()
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libparallel.py", line 72, in run
    return self.function(*self.args, **self.kwargs)
Traceback (most recent call last):
  File "/Users/Giuli003/software/haddock3/src/haddock/libs/libcns.py", line 307, in prepare_cns_input
    raise ValueError(f"Topology not found for pdb {pdb.rel_path}.")
  File "/Users/Giuli003/anaconda3/envs/haddock3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
ValueError: Topology not found for pdb ../data/0_rigidbody/1a2k_r_u.pdb.
rvhonorato commented 2 months ago

The error above means you are trying to execute a CNS module without having generated the topologies, not related to the contact module

Removing topoaa from being the "module 0" does add this dependency

amjjbonvin commented 2 months ago

Basically rigidbody in this case can not be run if topoaa was not first run.

Need to define dependencies...

mgiulini commented 2 months ago

yes, precisely. we need to catch this exception at the beginning asking the user to add topoaa to the workflow

VGPReys commented 2 months ago

Thanks for the review. I will make some modifications to solve this problem.

rvhonorato commented 2 months ago

Defining this dependency graph is quite complex and definitely besides the scope of this pr, could you please handle this in another?

Remember this is a beta version and these kinds of uncaught exceptions are tolerable, it's a work in progress anyway :)