ikalvet / heme_binder_diffusion

69 stars 13 forks source link

pipeline "general" container #2

Open plvdac1 opened 3 months ago

plvdac1 commented 3 months ago

Dear ikalvet, I was just trying to run your notebook on my cluster. However, I don't actually understand what the "general" container refers to:

"general": "/software/containers/users/ks427/240125_shifty.sif"

I was able to set up all the other paths, but this one I have no clue. Do you mind helping me sorting this out, please?

Best,

Marco

ikalvet commented 3 months ago

Thank you for pointing out this issue. It was meant to be an environment/container that has both pytorch and pyrosetta, however my instructions on all of that were quite vague. I have now updated this repository with two Conda environment YML files: envs/diffusion.yml and envs/mlfold.yml. The diffusion environment should be able to run all of the steps of the pipeline, apart from AlphaFold2, and then mlfold is for AF2.

I also updated the pipeline.ipynb notebook to reflect these changes. The Python paths in that dictionary now point to the environments one would set up based on the two YML files.

plvdac1 commented 3 months ago

Thank you so much! Now it seems to work properly and the fisrt script is regularly submitted to the cluster. However, now I have the another issue with Hydra. It seems that run_inference.py cannot find the config.yaml file. Even though it's present in {WDIR}. I tried to modify the script from --config-dir=../ to --config-dir={WDIR}, but still I get this: Any clue?

Thanks!

Cannot find primary config 'config.yaml'. Check that it's in your config search path.

Config search path:
        provider=hydra, path=pkg://hydra.conf
        provider=main, path=file:///home/shared/RFdiffusion/rf_diffusion_all_atom/config/inference
        provider=command-line, path=file:///home/lolicato/test/rfAA/example_Heme_diffusion
        provider=schema, path=structured://

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
ikalvet commented 3 months ago

That's really bizarre because I can't reproduce it on my end. Hydra nicely reads from a custom config if I give it either relative or full path. What if you put config.yaml to the same directory you're going to execute diffusion from, and provide --config-dir=./ ?

jesperdlau commented 2 months ago

The main function in run_inference.py uses the following hydra decorator: @hydra.main(version_base=None, config_path='config/inference')

From hydra documentation: "If the version_base parameter is None, then the defaults are chosen for the current minor Hydra version. For example for Hydra 1.2, then would imply config_path=None and hydra.job.chdir=False."

I have previously encountered similar problems where changing the version_base parameter to 1.3 fixed my issues. You can try parsing a --version-base=1.3 or so in the notebook like:

cmd = f"cd {pdbname} ; {PYTHON['diffusion']} {diffusion_script} --version-base=1.3 --config-dir=../ "\
              f"--config-name=config.yaml inference.input_pdb={p} "\
              f"inference.output_prefix='./out/{pdbname}_dif' > output.log ; cd ..\n"