ikalvet / heme_binder_diffusion

68 stars 13 forks source link

[Analyzing diffusion outputs] ValueError: Number of processes must be at least 1. No such file or directory: diffusion_analysis.sc #8

Open Lyueyang2020 opened 2 weeks ago

Lyueyang2020 commented 2 weeks ago

Hello, I am using your pipeline to design hemoglobin binding protein. When I run the code segment "Analyzing diffusion outputs," I encounter the following error. Thank you very much for your help.

### Analyzing diffusion outputs for clashes, ligand burial and scaffold quality
## If it's running too slowly consider increasing --nproc

analysis_script = f"{SCRIPT_DIR}/scripts/diffusion_analysis/process_diffusion_outputs.py"

diffusion_outputs = []
for d in diffusion_rundirs:
    diffusion_outputs += glob.glob(f"{d}/out/*.pdb")

# By default I don't use the --analyze flag. As a result the backbones are filtered as the script runs.
# You can set --analyze to True to calculate all scores for all backbones.
# This will slow the analysis down, but you can then filter the backbones separately afterwards.
dif_analysis_cmd_dict = {"--pdb": " ".join(diffusion_outputs),
                        "--ref": f"{SCRIPT_DIR}/input/*.pdb",
                        "--params": " ".join(params),
                        "--term_limit": "15.0",
                        "--SASA_limit": "0.3",  # Highest allowed relative SASA of ligand
                        "--loop_limit": "0.4",  # Fraction of backbone that can be loopy
                        "--ref_catres": "A15",  # Position of CYS in diffusion input
                        "--rethread": True,
                        "--fix": True,
                        "--exclude_clash_atoms": "O1 O2 O3 O4 C5 C10",  # Ligand atoms excluded from clashchecking because they are flexible
                        "--ligand_exposed_atoms": "C45 C46 C47",  # Ligand atoms that need to be more exposed
                        "--exposed_atom_SASA": "10.0",  # minimum absolute SASA for exposed ligand atoms
                        "--longest_helix": "30",
                        "--rog": "30.0",
                        "--partial": None,
                        "--outdir": None,
                        "--traj": "5/30",  # Also random 5 models are taken from the last 30 steps of the diffusion trajectory
                        "--trb": None,
                        "--analyze": False,
                        "--nproc": "1"}

analysis_command = f"{PYTHON['general']} {analysis_script}"
for k, val in dif_analysis_cmd_dict.items():
    if val is not None:
        if isinstance(val, list):
            analysis_command += f" {k}"
            analysis_command += " " + " ".join(val)
        elif isinstance(val, bool):
            if val == True:
                analysis_command += f" {k}"
        else:
            analysis_command += f" {k} {val}"
        print(k, val)

if len(diffusion_outputs) < 100:
    ## Analyzing locally
    p = subprocess.Popen(analysis_command, shell=True)
    (output, err) = p.communicate()
else:
    ## Too many structures to analyze.
    ## Running the analysis as a SLURM job.
    submit_script = "submit_diffusion_analysis.sh"
    utils.create_slurm_submit_script(filename=submit_script, name="diffusion_analysis",
                                     mem="8g", N_cores=dif_analysis_cmd_dict["--nproc"], time="0:20:00", email=EMAIL,
                                     command=analysis_command, outfile_name="output_analysis")

diffused_backbones_good = glob.glob(f"{DIFFUSION_DIR}/filtered_structures/*.pdb")

dif_analysis_df = pd.read_csv(f"{DIFFUSION_DIR}/diffusion_analysis.sc", header=0, sep="\s+")
--pdb 7o2g_HBA/7o2g_HBA_dif_4.pdb 7o2g_HBA/done.pdb 7o2g_HBA/7o2g_HBA_dif_2.pdb 7o2g_HBA/7o2g_HBA_dif_0.pdb 7o2g_HBA/7o2g_HBA_dif_3.pdb 7o2g_HBA/7o2g_HBA_dif_1.pdb
--ref /home/yueyang/design_protein/heme_binder_diffusion//input/*.pdb
--params /home/yueyang/design_protein/heme_binder_diffusion//theozyme/HBA/HBA.params
--term_limit 15.0
--SASA_limit 0.3
--loop_limit 0.4
--ref_catres A15
--rethread True
--fix True
--exclude_clash_atoms O1 O2 O3 O4 C5 C10
--ligand_exposed_atoms C45 C46 C47
--exposed_atom_SASA 10.0
--longest_helix 30
--rog 30.0
--traj 5/30
--analyze False
--nproc 1
-extra_res_fa /home/yueyang/design_protein/heme_binder_diffusion//theozyme/HBA/HBA.params
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov and PyRosetta Team              │
│              (C) Copyright Rosetta Commons Member Institutions               │
│                                                                              │
│ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │
│              See LICENSE.md or email license@uw.edu for details              │
└──────────────────────────────────────────────────────────────────────────────┘
PyRosetta-4 2024 [Rosetta PyRosetta4.conda.linux.cxx11thread.serialization.CentOS.python39.Release 2024.10+release.2c36cbc7108d85646ca5b8ddc89c29ac1ccde88e 2024-03-01T16:53:36] retrieved from: http://www.pyrosetta.org/
6 designs to analyze.
Using 0 processes
Traceback (most recent call last):
  File "/home/yueyang/design_protein/heme_binder_diffusion//scripts/diffusion_analysis/process_diffusion_outputs.py", line 883, in <module>
    main()
  File "/home/yueyang/design_protein/heme_binder_diffusion//scripts/diffusion_analysis/process_diffusion_outputs.py", line 845, in main
    pool = multiprocessing.Pool(processes=N_PROCESSES,
  File "/home/yueyang/miniconda3/envs/diffusion/lib/python3.9/multiprocessing/context.py", line 119, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild,
  File "/home/yueyang/miniconda3/envs/diffusion/lib/python3.9/multiprocessing/pool.py", line 205, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
---------------------------------------------------------------------------
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[63], [line 3](vscode-notebook-cell:?execution_count=63&line=3)
      [1](vscode-notebook-cell:?execution_count=63&line=1) diffused_backbones_good = glob.glob(f"{DIFFUSION_DIR}/filtered_structures/*.pdb")
----> [3](vscode-notebook-cell:?execution_count=63&line=3) dif_analysis_df = pd.read_csv(f"{DIFFUSION_DIR}/diffusion_analysis.sc", header=0, sep="\s+")

File [~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1026](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1026), in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
   [1013](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1013) kwds_defaults = _refine_defaults_read(
   [1014](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1014)     dialect,
   [1015](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1015)     delimiter,
   (...)
   [1022](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1022)     dtype_backend=dtype_backend,
   [1023](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1023) )
   [1024](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1024) kwds.update(kwds_defaults)
-> [1026](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1026) return _read(filepath_or_buffer, kwds)

File [~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:620](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:620), in _read(filepath_or_buffer, kwds)
    [617](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:617) _validate_names(kwds.get("names", None))
    [619](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:619) # Create the parser.
--> [620](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:620) parser = TextFileReader(filepath_or_buffer, **kwds)
    [622](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:622) if chunksize or iterator:
    [623](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:623)     return parser

File [~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1620](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1620), in TextFileReader.__init__(self, f, engine, **kwds)
   [1617](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1617)     self.options["has_index_names"] = kwds["has_index_names"]
...
    [880](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/common.py:880)     else:
    [881](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/common.py:881)         # Binary mode
    [882](https://vscode-remote+ssh-002dremote-002b192-002e168-002e1-002e161.vscode-resource.vscode-cdn.net/home/yueyang/design_protein/heme_binder_diffusion/~/miniconda3/envs/diffusion/lib/python3.9/site-packages/pandas/io/common.py:882)         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '/home/yueyang/design_protein/heme_binder_diffusion/outputs_lyy/live/0_diffusion/diffusion_analysis.sc'
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?72e6057e-024c-4b2a-a57c-40cc0a091ff8) or open in a [text editor](command:workbench.action.openLargeOutput?72e6057e-024c-4b2a-a57c-40cc0a091ff8). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
Lyueyang2020 commented 2 weeks ago

And I don't know what this paragraph means, could you explain it in detail? Thank you

## If you're done with diffusion and happy with the outputs then mark it as done
DIFFUSION_DIR = f"{WDIR}/0_diffusion"
os.chdir(DIFFUSION_DIR)

if not os.path.exists(DIFFUSION_DIR+"/.done"):
    with open(f"{DIFFUSION_DIR}/.done", "w") as file:
        file.write(f"Run user: {username}\n")
ikalvet commented 2 weeks ago

Thank you for giving it a shot! The "number of prcesses" error is a result of me doing a silly assumption on line 841 of the "process_diffusion_outputs.py" script. I fixed that issue and uploaded it.

Presumably this would also fix the "FileNotFoundError", which I assume arose from the fact that the analsyis script didn't produce an outputfile, and then the next command in the cell couldn't find the expected input.

ikalvet commented 2 weeks ago

And I don't know what this paragraph means, could you explain it in detail? Thank you

## If you're done with diffusion and happy with the outputs then mark it as done
DIFFUSION_DIR = f"{WDIR}/0_diffusion"
os.chdir(DIFFUSION_DIR)

if not os.path.exists(DIFFUSION_DIR+"/.done"):
    with open(f"{DIFFUSION_DIR}/.done", "w") as file:
        file.write(f"Run user: {username}\n")

The purpose of this kind of cell was to just create a file in the the run-directory of each step to indicate that this step is finished. If you end up reloading that notebook repeatedly and re-run the cells from the top then it would skip over these computationally expensive steps. For instance, if you're at step 3, and the notebook closed for some reason, then re-running the "Step 0" cell wouldn't just go and re-run the whole diffusion job, if it's marked as "done". I know it's not the most elegant implementation, but it get's the job done I suppose.