mwinokan commented 7 months ago

[ ] Syndirella provides list of indices per base compound from which to start expansion vectors
[x] PoseButcher samples vectors and returns distance and destination (protein/solvent/pocket)
[x] Derive estimate of maximum number of atoms added
[ ] Benchmark performance (correlate to sampling parameters)
[ ] Performance optimisation?
[ ] Syndirella does not place elaborations exceeded #atom limits from PoseButcher vectors
[x] Documentation / tutorial

kate-fie commented 7 months ago

I'm seeing this workflow

Posebutcher identify atom indicies on base compound to yes or no elaborate
Posebutcher derive estimate of number of atoms added
Syndirella atom maps from base compound to reactant
Filter reactants

I'm going to try the Kartograf atom mapping tool

mwinokan commented 7 months ago

Kate To-Do's

[ ] Figure out if it's possible to match reactant atom indices to the base
[ ] If not, try matching elaboration product atom indices to the base
[ ] Assume posebutcher will return a dictionary of:

{ atom_index: { "num_atom_added": X, "destination": Y }, ... }

Where X is an integer and Y is in ['protein', 'solvent', 'pocket']

mwinokan commented 7 months ago

Max To-Do's

[x] Create robust butcher for the relaxed 2a Ax0310a
[x] Fix explore method
[x] Write up basic instruction manual for using the explore method of the butcher

mwinokan commented 7 months ago

I finally got around to creating a butcher for the relaxed Ax0310a. I had to get rid of pockets P5 & P6 for now but it works. This is the butcher directory you can import with PoseButcher.from_directory(). You will need to get posebutcher==0.0.19 from PyPI

butcher_2a_x0310_noP5P6.zip

I will also write up a doc page with an example procedure. For now here is the sample butcher.explore output for the base LXINEYASRREWNB-VIFPVBQESA-N (N.B. the fields 'destination' and 'max_atoms_added':

[{'atom_index': 0,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([ 0.7915107 , -0.34428629, -0.50495322]),
  'intersections': {5.925: ('BAD', 'solvent space')},
  'first_intersection_distance': 5.925,
  'new_pocket': False,
  'last_intersection_distance': 5.925,
  'destination': 'solvent space',
  'max_atoms_added': inf,
  'success': True},
 {'atom_index': 1},
 {'atom_index': 2},
 {'atom_index': 3,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([-0.54640907, -0.36690284,  0.75287411]),
  'intersections': {3.177: ('BAD', 'protein clash')},
  'first_intersection_distance': 3.177,
  'new_pocket': False,
  'last_intersection_distance': 3.177,
  'destination': 'protein clash',
  'max_atoms_added': 7,
  'success': True},
 {'atom_index': 4},
 {'atom_index': 5,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([-0.03015966,  0.02861301,  0.99913547]),
  'intersections': {2.21: ('BAD', 'protein clash')},
  'first_intersection_distance': 2.21,
  'new_pocket': False,
  'last_intersection_distance': 2.21,
  'destination': 'protein clash',
  'max_atoms_added': 1,
  'success': True},
 {'atom_index': 6},
 {'atom_index': 7,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([ 0.06275531,  0.23879619, -0.96903981]),
  'intersections': {0.887: ('GOOD', 'pocket', "P1'"),
   5.631: ('BAD', 'solvent space')},
  'first_intersection_distance': 0.887,
  'new_pocket': True,
  'last_intersection_distance': 5.631,
  'destination': 'solvent space',
  'max_atoms_added': inf,
  'success': True},
 {'atom_index': 8,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([-0.63736877, -0.5844069 ,  0.50222468]),
  'intersections': {1.903: ('GOOD', 'pocket', "P2'"),
   1.995: ('BAD', 'protein clash')},
  'first_intersection_distance': 1.903,
  'new_pocket': True,
  'last_intersection_distance': 1.995,
  'destination': 'protein clash',
  'max_atoms_added': 1,
  'success': True},
 {'atom_index': 9,
  'origin': ('GOOD', 'pocket', 'P2'),
  'direction': array([ 0.17790221,  0.93172511, -0.31660562]),
  'intersections': {0.027: ('GOOD', 'pocket', "P1'"),
   2.367: ('BAD', 'protein clash')},
  'first_intersection_distance': 0.027,
  'new_pocket': True,
  'last_intersection_distance': 2.367,
  'destination': 'protein clash',
  'max_atoms_added': 1,
  'success': True},
 {'atom_index': 10,
  'origin': ('GOOD', 'pocket', 'P2'),
  'direction': array([-0.13990879,  0.9836797 ,  0.11313612]),
  'intersections': {1.63: ('BAD', 'protein clash')},
  'first_intersection_distance': 1.63,
  'new_pocket': False,
  'last_intersection_distance': 1.63,
  'destination': 'protein clash',
  'max_atoms_added': 1,
  'success': True},
 {'atom_index': 11},
 {'atom_index': 12,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([ 0.32430707, -0.38268234, -0.8650891 ]),
  'intersections': {4.516: ('BAD', 'solvent space')},
  'first_intersection_distance': 4.516,
  'new_pocket': False,
  'last_intersection_distance': 4.516,
  'destination': 'solvent space',
  'max_atoms_added': 7,
  'success': True},
 {'atom_index': 13},
 {'atom_index': 14,
  'origin': ('GOOD', 'pocket', 'P1'),
  'direction': array([-0.29913694,  0.55005234,  0.77971759]),
  'intersections': {0.638: ('GOOD', 'pocket', 'P2'),
   7.817: ('BAD', 'protein clash')},
  'first_intersection_distance': 0.638,
  'new_pocket': True,
  'last_intersection_distance': 7.817,
  'destination': 'protein clash',
  'max_atoms_added': inf,
  'success': True},
 {'atom_index': 15},
 {'atom_index': 16,
  'origin': ('BAD', 'solvent space'),
  'direction': array([-0.94793021,  0.28411648,  0.14389633]),
  'intersections': {2.212: ('GOOD', 'pocket', "P1'"),
   4.422: ('BAD', 'protein clash')},
  'first_intersection_distance': 2.212,
  'new_pocket': True,
  'last_intersection_distance': 4.422,
  'destination': 'protein clash',
  'max_atoms_added': 7,
  'success': True},
 {'atom_index': 17},
 {'atom_index': 18},
 {'atom_index': 19,
  'origin': ('BAD', 'solvent space'),
  'direction': array([ 0.55288204, -0.7298518 ,  0.40204204]),
  'intersections': {4.862: ('BAD', 'protein clash')},
  'first_intersection_distance': 4.862,
  'new_pocket': False,
  'last_intersection_distance': 4.862,
  'destination': 'protein clash',
  'max_atoms_added': 14,
  'success': True}]

mwinokan commented 7 months ago

See the doc page I wrote up

You will need molparse==0.0.18 and posebutcher==0.0.20

mwinokan commented 6 months ago

@kate-fie here is a quick summary of what I suggested today:

To compare elaboration E to base B (could be reactant superstructure R' vs R too):

Classify the vectors expanding from B using posebutcher.explore
Calculate all the MCS mappings E onto B
For each of those mappings:
- Evaluate which vector limits are exceeded
- Dismiss / don't place any elaboration E where any of the vector limits are exceeded for all of it's mappings
- All other elaborations should be placed

Just a clarification on the conditional, I think the following pseudocode should do the trick:

place = False
for mapping in elaboration.mappings:
    for vector in mapping.vectors:
        if not is_vector_valid(vector):
            break
    else:
        # all vectors valid
        place = True
        break

mwinokan / PoseButcher

Provide expansion vector limits to Syndirella #29

Kate To-Do's

Max To-Do's