schlessinger-lab / pyvol

volume calculation and segmentation
MIT License
26 stars 13 forks source link

pocket mode=specific is not callable #9

Open bertadenes opened 3 years ago

bertadenes commented 3 years ago

Exploring the module, looks pretty decent. However, every attempt running specific mode reverts back to run 'largest' instead:

Running with opensource pymol 2.3.0 on Ubuntu 20.04.1, msms v2.6.1

rhs2132 commented 3 years ago

This is quite late, but I finally have some time to push some changes and wanted to follow up on this. If you don't mind looking back, what options did you supply to run in specific mode?

I think I will add an option to prevent falling back to default behavior and add better log information to explain unexpected behavior like this.

bertadenes commented 3 years ago

Sure here is an example with the GUI v1.7.8 (on top of pymol 2.3.0) I fetched a random PDB (9ICV), in which there are four pockets above 100 A^3, in default:

pyvol.pymol_interface:    INFO     Pocket 0 (105920-983185_9ICV_p0)  Volume: 680.0 A^3
pyvol.pymol_interface:    INFO     Pocket 1 (105920-983185_9ICV_p1)  Volume: 651.0 A^3
pyvol.pymol_interface:    INFO     Pocket 2 (105920-983185_9ICV_p2)  Volume: 188.0 A^3
pyvol.pymol_interface:    INFO     Pocket 3 (105920-983185_9ICV_p3)  Volume: 148.0 A^3

The smallest one is close to the residue Lys68, therefore I tried this: image Which returns the largest pocket only. The corresponding config file is generated wrongly in the first place:

[General]
prot_file = /home/berta/temp/110246-476122_9ICV.pyvol/110246-476122_9ICV_prot.pdb
min_rad = 1.4
max_rad = 3.4
constrain_radii = False

[Specification]
mode = largest

[Partitioning]
subdivide = False

[Output]
output_dir = /home/berta/temp/110246-476122_9ICV.pyvol
prefix = 110246-476122_9ICV
logger_stream_level = INFO
logger_file_level = DEBUG

[PyMOL]
protein = (9ICV) and (poly)
protein_only = True
display_mode = solid
alpha = 1.0

Same behaviour using the command: pocket protein=9ICV, mode=specific, residue="i. 68" I am fairly confident you can reproduce this issue, we did encounter it on three or four different computers.

Christinele14 commented 1 year ago

I also meet the issue when trying to use mode = specific. I tried pyvol in command line with the input_template.cfg file. I tried to put ligand (in PDB format) together with protein (PDB format) to calculate the pocket volume in which the ligand binds. However, it returns ValueError: number of xyz array columns must equal 3. I tried with the test case of 1uwh_B_lig.pdb and 1uwh_B_prot.pdb in pyvol/tests folder but it still returns the mentioned ValueError as well. Please take a look at it. Thank you.

rhs2132 commented 1 year ago

I realize that the code is unhelpfully opaque here. Any PyMOL inputs get cleaned up to remove improperly formatted inputs and then are processed with calls into PyMOL. The 'largest' mode is the default, so if the specification fails it reverts to that behavior but apparently without enough transparency.

@bertadenes in your case it doesn't like the residue selection. That must be a PyMOL selection string that specifies the residue. If you want to use the residue number, the 'resid' parameter is the correct alternative.

@Christinele14 can you provide the generated config file?

Christinele14 commented 1 year ago

Because it met error, the config file cannot be generated. Here is the cfg file I used to run pyvol with specific mode together with a ligand which binds with protein in the pocket I would like to calculate volume.

[General]
prot_file = 1uwh_B_prot.pdb
lig_file = 1uwh_B_lig.pdb
min_rad = 1.4
max_rad = 3.4
constrain_radii = True

[Specification]
mode = specific
# lig_excl_rad = 3.5
# lig_incl_rad = 5.2
min_volume = 200

[Partitioning]
subdivide = False
min_subpocket_rad = 1.7
max_subpocket_rad = 3.4
min_subpocket_surf_rad = 1.0
radial_sampling = 0.1
inclusion_radius_buffer = 1.0
min_cluster_size = 50

[Output]
project_dir = /home/christine/pyvol/tests
logger_stream_level = INFO
logger_file_level = DEBUG
alephreish commented 1 year ago

Hey all, I've stumbled upon this issue today. The documentation is indeed pretty opaque about this. The relevant piece of code is in identify.py:

        elif opts.get("mode") == "specific":
            if opts.get("coordinates") is not None:
                coordinate = opts.get("coordinates")
                logger.info("Specific pocket identified from coordinate: {0}".format(opts.get("coordinates")))
            elif opts.get("resid") is not None:
                resid = str(opts.get("resid"))
                chain = None
                if not resid[0].isdigit():
                    chain = resid[0]
                    resid = int(resid[1:])
                else:
                    resid = int(resid)
                coordinate = utilities.coordinates_for_resid(opts.get("prot_file"), resid=resid, chain=chain)
                logger.info("Specific pocket identified from residue: {0} -> {1} (truncated)".format(opts.get("resid"), coordinate[0,:]))
            elif l_s is not None:
                lig_coords = l_s.xyz
                coordinate = np.mean(l_s.xyz, axis=0).reshape(1, -1)
                logger.info("Specific pocket identified from mean ligand position: {0}".format(coordinate))
            else:
                logger.error("A coordinate, ligand, or residue must be supplied to run in specific mode")
                return None

I haven't tried to make it work with resid, but coordinates works fine with the CLI and the GUI:

[Specification]
mode = specific
coordinates = -2.335 -2.01 -3.167

The coordinates should be delimited with a single space (opts["coordinates"] = np.asarray([float(x) for x in opts.get("coordinates").split(" ")]).reshape(-1,3)).

Another important thing is that scipy should be pinned to a version before 1.9.0 (e.g. scipy=1.8.1 from conda-forge works fine) since this is when the n_jobs argument to scipy.spatial.cKDTree.query was removed, otherwise you get an exception:

pyvol.identify:     INFO        Specific pocket identified from coordinate: [[-2.335 -2.01  -3.167]]
Traceback (most recent call last):
  File "/bigdata/andrey/antenna-alphafold/pyvol_env/bin/pyvol", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/bigdata/andrey/antenna-alphafold/pyvol_env/lib/python3.11/site-packages/pyvol/__main__.py", line 17, in main
    identify.pocket_wrapper(**configuration.file_to_opts(args.cfg_file))
  File "/bigdata/andrey/antenna-alphafold/pyvol_env/lib/python3.11/site-packages/pyvol/identify.py", line 186, in pocket_wrapper
    all_pockets, output_opts = pocket(**opts)
                               ^^^^^^^^^^^^^^
  File "/bigdata/andrey/antenna-alphafold/pyvol_env/lib/python3.11/site-packages/pyvol/identify.py", line 142, in pocket
    id_coord = p_bs.nearest_coord_to_external(coordinate).reshape(1, -1)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/bigdata/andrey/antenna-alphafold/pyvol_env/lib/python3.11/site-packages/pyvol/spheres.py", line 329, in nearest_coord_to_external
    dist, indices = kdtree.query(coordinates, n_jobs=-1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "_ckdtree.pyx", line 786, in scipy.spatial._ckdtree.cKDTree.query
  File "_ckdtree.pyx", line 388, in scipy.spatial._ckdtree.get_num_workers
TypeError: Unexpected keyword argument {'n_jobs': -1}
alephreish commented 1 year ago

Weird enough though, there are cases in which searching around a coordinate fails with "Binding pocket not correctly identified--try an alternative method to specify the binding pocket" even if the coordinate is inside the pocket. Try https://alphafold.ebi.ac.uk/files/AF-A0A1L2YW92-F1-model_v4.pdb with coordinates = 2, -1, -1. This seems to be caused by p_bs.nearest_coord_to_external(coordinate).reshape(1, -1) returning a coordinate that is well outside of the protein. The pocket is identified correctly with mode=all or mode=largest.