mattragoza / LiGAN

Deep generative models of 3D grids for structure-based drug discovery
GNU General Public License v2.0
223 stars 43 forks source link

how to set the input for simple_fit.py and test_dkoes_simple_fit.py #9

Closed yangxiufengsia closed 3 years ago

yangxiufengsia commented 3 years ago

Hi, I am trying to use simple_fit.py and test_dkoes_simple_fit.py to obtain a molecules from a grid. I tried the following setting for the input path of sdf file: --for simple_fit.py, I used the exact same code and input a sdf file. if name == 'main':
results = []

print('Globbing input files')
files = glob.glob('/home.local/Level_admin/yang/liGAN/1b3g_ligand.sdf')
print (files)

print('Starting to fit molecules')
for (i,fname) in enumerate(files):
    print (i,fname)
    try:
        start = time.time()
        struct, fittime, loss, fixes, rmsd = fitmol(fname,25)
        mol,misses = make_mol(struct)
        mol = pybel.Molecule(mol)

        totaltime = time.time()-start
        ligname = os.path.split(fname)[1]    

        mol.write('sdf','output/fit_%s'%ligname,overwrite=True)
        print('{}/{}'.format(i+1, len(files)))        
    except Exception as e:
        print("Failed",fname,e)

results = pd.DataFrame(results,columns=('lig','loss','fixes','fittime','totaltime','misses','rmsd'))
results.to_csv('cntfixes.csv')

sns.boxplot(data=results,x='misses',y='loss')
plt.savefig('loss_by_misses_box.png')

plt.hist(results.loss,bins=np.logspace(-6,1,8))
plt.gca().set_xscale('log')
plt.savefig('loss_hist.png')

print('Low loss but nonzero misses, sorted by misses:')
print(results[(results.loss < 0.1) & (results.misses > 0)].sort_values(by='misses'))

print('Overall average loss:')
print(np.mean(results.loss))

plt.hist(results.fittime)
plt.savefig('fit_time_hist.png')

print('Average fit time and total time')
print(np.mean(results.fittime))
print(np.mean(results.totaltime))

print('Undefined RMSD sorted by loss:')
print(results[np.isinf(results.rmsd)].sort_values(by='loss'))

print('All results sorted by loss:')
print(results.sort_values(by='loss'))

and got the following error: Failed /home.local/Level_admin/yang/liGAN/1b3g_ligand.sdf Invalid input dimensions in forward of Coords2Grid

yangxiufengsia commented 3 years ago

Could you help provide some clues for solving this error? thank you so much.

yangxiufengsia commented 3 years ago

import sys, os import numpy as np from openbabel import openbabel as ob ligan_root = os.environ['LIGAN_ROOT'] sys.path.append(ligan_root) import atom_types import generate

def test_remove_tensors_circular(): a = [] b = [a] a.append(b) generate.remove_tensors(a)

def test_dkoes_atom_fitter():

fitter = generate.DkoesAtomFitter(
    dkoes_make_mol=True,
    use_openbabel=False,
)
channels = atom_types.get_channels_from_file(
    os.path.join(ligan_root, 'my_lig_map'),
)
grid_shape = (len(channels), 48, 48, 48)
grid = generate.MolGrid(
    values=np.zeros(grid_shape),
    channels=channels,
    center=np.zeros(3),
    resolution=0.5,
)
grid = fitter.fit(grid, [])
assert grid.info['src_struct'].n_atoms == 0

print('here')

I also tested this code, but I am not clear how to set the "ligan_root" and "'my_lig_map'". Very appreciate that you can provide some help for setting this.

mattragoza commented 3 years ago

Hi and thank you for your interest in this project.

To answer the questions from your first issue:

1) The primary fitting algorithm is currently AtomFitter.fit​ in generate.py. The functions in fitting.py and simple_fit.py are different algorithms that are less well tested at this point.

2) Atom fitting fits a set of atom types and coordinates to a reference density, it does not fit a density to a set of atoms. So yes, you must provide a reference density to AtomFitter.fit​ in order for it to have something to fit atoms to. The reference density can be either "real" or "generated". A real density can be produce using libmolgrid​ by calling the function molgrid.GridMaker.forward​ on atom types and coordinates. The libmolgrid documentation has more examples and details. Densities can also be generated from a generative model, and the scripts generate.py​ is used for this purpose, but you must provide it with the model architecture and weights.

The path LIGAN_ROOT should point to the root directory of the git repository, i.e. wherever you put the repo with git clone​.

Hope that this helps.


From: Xiufeng Yang notifications@github.com Sent: Tuesday, November 24, 2020 3:16 AM To: mattragoza/liGAN liGAN@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [mattragoza/liGAN] how to set the input for simple_fit.py and test_dkoes_simple_fit.py (#9)

import sys, os import numpy as np from openbabel import openbabel as ob ligan_root = os.environ['LIGAN_ROOT'] sys.path.append(ligan_root) import atom_types import generate

def test_remove_tensors_circular(): a = [] b = [a] a.append(b) generate.remove_tensors(a)

def test_dkoes_atom_fitter():

fitter = generate.DkoesAtomFitter( dkoes_make_mol=True, use_openbabel=False, ) channels = atom_types.get_channels_from_file( os.path.join(ligan_root, 'my_lig_map'), ) grid_shape = (len(channels), 48, 48, 48) grid = generate.MolGrid( values=np.zeros(grid_shape), channels=channels, center=np.zeros(3), resolution=0.5, ) grid = fitter.fit(grid, []) assert grid.info['src_struct'].n_atoms == 0

print('here')

I also tested this code, but I am not clear how to set the "ligan_root" and "'my_lig_map'". Very appreciate that you can provide some help for setting this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmattragoza%2FliGAN%2Fissues%2F9%23issuecomment-732732564&data=04%7C01%7Cmtr22%40pitt.edu%7Cf6ae6201f5d54cb0f0c008d890514e71%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637418026160534652%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8noANdwTQ1Jdf%2FpxTwcHaJWC1mBz9EuFy440U8RSntY%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAB2LA3DHJ2P535G73SWYGVDSRNTXPANCNFSM4UAQQ5TQ&data=04%7C01%7Cmtr22%40pitt.edu%7Cf6ae6201f5d54cb0f0c008d890514e71%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637418026160534652%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jh9E%2BT3A%2Bf%2F%2FmPBUz0f0Rv5Zu6pCKDCPgWQbGZ1SMf8%3D&reserved=0.

yangxiufengsia commented 3 years ago

Thank you very much for solving my confusions!@mattragoza . I still don't understand how fit atom types to a density, what is the objective function (loss function) do you want to optimize (in your code, adam is used)? Here is my understanding about how the liGAN works:

  1. transfer molecules to grid and density using libmolgrid (I understand this part).
  2. input density->encoder->latent space (I understand this part as well)
  3. latent vectors->decoder->density (I understand this part) 4. fit density to atom types and coordinates using gradient descent (I can not understand this part) How did you obtain the gradient at step 4? my current understanding is lossfunction=MSE(reference_density-generated_density. Very appreciate if you could help check my understanding and answer my confusions. Thank you again.

By the way, have all the data used in your paper been attached in this repository?

mattragoza commented 3 years ago

Atom fitting solves the following optimization problem:

atom_fitting_problem_def_small

The gridding function g(A) is the same function that is used to compute the density grid representation of a real molecule for use as input to the neural network. This function is differentiable, so for a given "hypothesis" set of atoms A, we can compute the gradient of the L2 loss on the atom coordinates, then minimize with gradient descent. See this notebook for the derivation of this gradient.

We do not necessarily know how many atoms of each type exist in the reference grid. However, we can estimate whether placing an atom at a particular grid point will decrease the L2 loss, and we can empirically check whether it does through trial and error. Therefore, we use an atom detection function to propose new atom initializations on the remaining grid density before expanding the structure to the proposed atom(s), performing gradient descent, and checking whether the loss improved. The algorithm performs this iteratively in a beam search to find the best-fit set of atom types and coordinates.

mattragoza commented 3 years ago

There is a script download_data.sh that retrieves and sets up the necessary data files to reproduce our paper results.

yangxiufengsia commented 3 years ago

@mattragoza Thank you so much for solving my confusions. I will read more details about your paper. Again, thank you.

yangxiufengsia commented 3 years ago

Hi,

I read your atom fitting pseudo-code in your paper and your atom fitting code. Here are my questiones:

  1. In your pseudo-code, you only have the input of reference grid density (true grid density, i think), how can you minimize the loss between generated grid density and the reference grid density? since there is not input of generated grid.
  2. Is your atom fitting algorithm only used for fit the true density grid back to true atom types? Is this understanding correct?
  3. In you generate.py, I did not find how to fit a atom types to a generated grid density as well. Do I miss some important information?

looking forward to your reply and thanks a lot.