pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
264 stars 73 forks source link

The initial ligand #9

Open 1121091694 opened 2 years ago

1121091694 commented 2 years ago

Nice Paper,could you tell me how to add the atom and atomic coordinate of the initial ligand in the sample of protein pocket? Just like smiles seq2seq, we give the prompt of sequence like CCCC. In this work, we give the initial ligand like atom and xyz.

pengxingang commented 2 years ago

Thanks for your interest in the work. Pocket2Mol was initially proposed for de novo drug generation (i.e., generating initial atoms by itself). However, since it autoregressively generates atoms, it can be modified to generate molecules given initial atoms.

In order to add the initial atoms (including the element types, coordinates, and the connecting bonds) as prior, you have to modify the codes by yourself. More specifically, you can modify the get_init function in line 140 of sample_for_pdb.py. This function (defined in line 37 of sample.py) is used to generate the initial atoms. In this function, you can remove model.sample_init and directly add the information of your prior atoms into the items of data_next_list.

Ruibin-Liu commented 1 week ago

Thanks for your interest in the work. Pocket2Mol was initially proposed for de novo drug generation (i.e., generating initial atoms by itself). However, since it autoregressively generates atoms, it can be modified to generate molecules given initial atoms.

In order to add the initial atoms (including the element types, coordinates, and the connecting bonds) as prior, you have to modify the codes by yourself. More specifically, you can modify the get_init function in line 140 of sample_for_pdb.py. This function (defined in line 37 of sample.py) is used to generate the initial atoms. In this function, you can remove model.sample_init and directly add the information of your prior atoms into the items of data_next_list.

My understanding is we need to generate the predictions list to feed into this block:

   # has frontiers
   data.status = STATUS_RUNNING
   (has_frontier, idx_frontier, p_frontier,
   idx_focal_in_compose, p_focal,
   pos_generated, pdf_pos, abs_pos_mu, pos_sigma, pos_pi,
   element_pred, element_prob, has_atom_prob) = [p.cpu() for p in predicitions]

Or simply assign values to the variables from has_frontier, idx_frontier, to has_atom_prob.

Suppose I have a simple seed ligand like C=O which has the following xyz-like format

C 0.0 0.0 0.0
O 1.3 0.0 0.0

where the cartesian coordinates of course should be adjusted to the PDB pocket I want to sample, how should we assign those variables considering we want to grow the ligand through the carbon not the oxygen atom? One thing that I think you didn't mention in your reply is possibly we need to adjust the pdb_to_pocket_data function in the sample_for_pdb.py file to include the seed ligand information. Or that's totally unnecessary?