mims-harvard / TDC

Therapeutics Commons (TDC-2): Multimodal Foundation for Therapeutic Science
https://tdcommons.ai
MIT License
1.01k stars 174 forks source link

(documentation) docking oracle examples are badly out of date #149

Closed samuelstanton closed 2 years ago

samuelstanton commented 2 years ago

Describe the bug I'm trying to run the docking score example from the documentation

from tdc import Oracle
# 1. One can specify the binding pocket by a docked pdb file
oracle = Oracle(name = 'Docking_Score', software='vina', 
                pyscreener_path = '/path/to/pyscreener', 
                receptors=['examples/docking/5WIU.pdb'], 
                docked_ligand_file='examples/docking/5WIU_with_ligand.pdb',
                buffer=10, path='./my_test/', num_worker=1, ncpu=4)

oracle('c1ccccc1')

I've verified that ADFR and Vina are installed correctly (e.g. can run which prepare_receptor and which vina successfully). Currently I get the following error in a Jupyter notebook

File ~/miniconda3/envs/lambo-env/lib/python3.8/site-packages/tdc/oracles.py:39, in Oracle.__init__(self, name, target_smiles, num_max_call, **kwargs)
     37     self.name = name
     38 self.evaluator_func = None
---> 39 self.assign_evaluator() 
     40 self.num_called = 0
     42 if num_max_call is not None:

File ~/miniconda3/envs/lambo-env/lib/python3.8/site-packages/tdc/oracles.py:192, in Oracle.assign_evaluator(self)
    190 elif self.name == 'docking_score':
    191     from .chem_utils import Vina_smiles
--> 192     self.evaluator_func = Vina_smiles(**self.kwargs)
    193 elif self.name == 'drd3_docking_vina' or self.name == '3pbl_docking_vina':
    195     from .chem_utils import Vina_smiles 

TypeError: __init__() got an unexpected keyword argument 'software'

I assume this is some kind of dependency issue, it would be helpful if the required dependency versions could be made more explicit.

To Reproduce Follow the documentation and try to run the example

Expected behavior I expect the same output as the documentation

Environment:

samuelstanton commented 2 years ago

update:

Please update the docs!!!

It seems like this is an issue with the TDC version, I downgraded to 0.2.0 and now I get a different error (apparently in pyscreener now). Is there an updated example anywhere for reference?

Autoboxing ... 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [3], in <cell line: 3>()
      1 from tdc import Oracle
      2 # 1. One can specify the binding pocket by a docked pdb file
----> 3 oracle = Oracle(name = 'Docking_Score', software='vina', 
      4                 pyscreener_path = '/home/sam/Code/bo-protein-context/pyscreener', 
      5                 receptors=['../lambo/assets/tdc/docking/5WIU.pdb'], 
      6                 docked_ligand_file='../lambo/assets/tdc/docking/5WIU_with_ligand.pdb',
      7                 buffer=10, path='./my_test/', num_worker=1, ncpu=4)
      9 oracle('c1ccccc1')

File ~/miniconda3/envs/lambo-env/lib/python3.8/site-packages/tdc/oracles.py:21, in Oracle.__init__(self, name, target_smiles, num_max_call, **kwargs)
     19     self.name = name
     20 self.evaluator_func = None
---> 21 self.assign_evaluator() 
     22 self.num_called = 0
     24 if num_max_call is not None:

File ~/miniconda3/envs/lambo-env/lib/python3.8/site-packages/tdc/oracles.py:169, in Oracle.assign_evaluator(self)
    167 elif self.name == 'docking_score':
    168     from .chem_utils import docking_meta
--> 169     self.evaluator_func = docking_meta(**self.kwargs)
    170 # distribution oracle 
    171 # ['novelty', 'diversity', 'uniqueness', 'validity', 'fcd_distance', 'kl_divergence']  
    172 elif self.name == 'uniqueness':

File ~/miniconda3/envs/lambo-env/lib/python3.8/site-packages/tdc/chem_utils.py:1696, in docking_meta.__init__(self, software_calss, pyscreener_path, **kwargs)
   1693 else:
   1694     raise ValueError("The value of software_calss is not implemented. Currently available:['vina', 'dock6']")
-> 1696 self.scorer = screener(**kwargs)

File ~/Code/bo-protein-context/pyscreener/pyscreener/docking/vina.py:88, in Vina.__init__(self, software, receptors, pdbids, center, size, ncpu, extra, docked_ligand_file, buffer, score_mode, repeats, receptor_score_mode, ensemble_score_mode, distributed, num_workers, path, verbose, **kwargs)
     86 if center is None:
     87     print('Autoboxing ...', end=' ', flush=True)
---> 88     center, size = autobox.docked_ligand(docked_ligand_file, buffer)
     89     print('Done!')
     90     s_center = f'({center[0]:0.1f}, {center[1]:0.1f}, {center[2]:0.1f})'

File ~/Code/bo-protein-context/pyscreener/pyscreener/preprocessing/autobox.py:92, in docked_ligand(docked_ligand_file, buffer)
     90             break
     91     fid = chain([line], fid)    # prepend the first line to the generator
---> 92     ligand_atom_coords = [
     93         parse_xyz(line)
     94         for line in takewhile(lambda line: 'HETATM' in line, fid)
     95     ]
     97 return minimum_bounding_box(ligand_atom_coords, buffer)

File ~/Code/bo-protein-context/pyscreener/pyscreener/preprocessing/autobox.py:93, in <listcomp>(.0)
     90             break
     91     fid = chain([line], fid)    # prepend the first line to the generator
     92     ligand_atom_coords = [
---> 93         parse_xyz(line)
     94         for line in takewhile(lambda line: 'HETATM' in line, fid)
     95     ]
     97 return minimum_bounding_box(ligand_atom_coords, buffer)

File ~/Code/bo-protein-context/pyscreener/pyscreener/preprocessing/autobox.py:100, in parse_xyz(line)
     99 def parse_xyz(line: str) -> Tuple[float, float, float]:
--> 100     return tuple(map(float, line.split()[5:8]))

ValueError: could not convert string to float: 'CAB'
samuelstanton commented 2 years ago

update:

finally got something working after digging around in the code. The docs are badly out of date, please update them soon so other don't have to waste half a day like I did trying out a broken example. I would suggest leaving this issue open until the docs are updated.

Also worth noting that even if echo $PATH looks right in the terminal make sure it's also right in the Jupyter notebook. I was running the notebook on a headless server in a tmux window and eventually noticed that the PATH variable didn't look right.

working snippet:

import os

from tdc import Oracle

print(os.environ.get('PATH'))
receptor_id = "3pbl"
oracle = Oracle(
    name=f'{receptor_id}_docking',
)
oracle('c1ccccc1')

output:

Found local copy...
2022-04-08 19:28:57,172 INFO services.py:1172 -- View the Ray dashboard at http://127.0.0.1:8265
Docking: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.41s/ligand]
-4.1
kexinhuang12345 commented 2 years ago

Hi Samuel, sorry for the confusion! we will be updating the doc ASAP!!

samuelstanton commented 2 years ago

Good to hear! This seems like a very useful package, hopefully the onboarding process can be made a bit smoother.

kexinhuang12345 commented 2 years ago

Thank you! It is now updated! Let us know if there is any further question!