pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
242 stars 65 forks source link

Preprocessing required for sampling for a pdb file #37

Closed NiteshGit123 closed 4 months ago

NiteshGit123 commented 5 months ago

Hi, Can you please mention the preprocessing that has to be carried out for a pdb file from PDB RCSB , so that I can sample molecules for that protein using sample_for_pdb.py.The pdb file you have given in examples folder works fine but if i take a pdb file directly from this link:https://www.rcsb.org/structure/1CQP, it shows following error:

taskset -c 0 python3 sample_for_pdb.py --pdb_path ./example/1CQP.pdb --center "32.0,28.0,36.0" --device "cpu" [2024-01-29 04:27:42,224::sample::INFO] Namespace(bbox_size=23.0, center=[32.0, 28.0, 36.0], config='./configs/sample_for_pdb.yml', device='cpu', outdir='./outputs', pdb_path='./example/1CQP.pdb') [2024-01-29 04:27:42,224::sample::INFO] {'model': {'checkpoint': './ckpt/pretrained_Pocket2Mol.pt'}, 'sample': {'seed': 2020, 'num_samples': 100, 'beam_size': 100, 'max_steps': 50, 'threshold': {'focal_threshold': 0.5, 'pos_threshold': 0.25, 'element_threshold': 0.3, 'hasatom_threshold': 0.6, 'bond_threshold': 0.4}}} [2024-01-29 04:27:42,224::sample::INFO] Loading data... [2024-01-29 04:27:42,297::sample::INFO] Loading main model... [2024-01-29 04:27:42,416::sample::INFO] Initialization InitSample: 1%|██▏ | 1/100 [00:00<00:28, 3.50it/s] [2024-01-29 04:27:42,703::sample::INFO] [Pool] Queue 1 | Finished 0 | Failed 0 [2024-01-29 04:27:42,703::sample::INFO] Saving samples... [2024-01-29 04:27:42,705::sample::INFO] Start sampling 0%| | 0/1 [00:00<?, ?it/s][2024-01-29 04:27:42,929::sample::INFO] Success: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 4.46it/s] Traceback (most recent call last): File "sample_for_pdb.py", line 209, in next_idx = np.random.choice(np.arange(n_tmp), p=prob, size=min(config.sample.beam_size, n_tmp), replace=False) File "mtrand.pyx", line 939, in numpy.random.mtrand.RandomState.choice ValueError: probabilities do not sum to 1

Thankyou

pengxingang commented 5 months ago

Hi, actually no more preprocessing is explicitly required for sampling. According to the error message, the model did not successfully generate atoms in the initial step. A possible reason is that the pocket position is not correctly specified. To use your own protein pdb file, you may need to change the input arguments --center and --bbox_size to define a box to specify the pocket position.

You can use pymol to visualize your protein pdb file and determine the binding pocket. Then define a box to cover the pocket and input the correct --center and --bbox_size parameters for sampling.

NiteshGit123 commented 5 months ago

Thankyou @pengxingang for clarification.