pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
272 stars 73 forks source link

I want to use your tools but i found that there was a missing file named 'crossdocked_pocket10_name2id.pt' from the folder './data/' #2

Closed Qmi3 closed 2 years ago

Qmi3 commented 2 years ago

I will be really appreciated if you can submit the file

pengxingang commented 2 years ago

Hi, the file crossdocked_pocket10_name2id.pt is automatically generated by the code. You should download crossdocked_pocket10.tar.gz and unzip it. It includes all the pocket-molecule complex structures of the training and testing sets. When you run the code (train or sample) for the first time,it will judge whether the data has been processed (i.e., whether the processed file exists, processed_path=config.data.dataset.path+'_processed.lmdb', line 25 of utils/datasets/pl.py). If not, it will read the structure files in crossdocked_pocket10 to generate preprocessed files(crossdocked_pocket10_processed.lmdb and crossdocked_pocket10_name2id.pt, in line 26-27 of utils/datasets/pl.py).

So please make sure you follow the instruction of data preparation. (I guess you downloaded the lmdb file from the cloud.)

Qmi3 commented 2 years ago

Actually, I did download the IMDB file from the cloud hhhh, i solved this problem as you said. But 2 errors happned qaq : could not sanitize molecule endinh on line xx; explicit valence for atom # xx N ,4, is greater than permitted.

pengxingang commented 2 years ago

Yes, this error may happen because some of the molecules can't be parsed by RDkit. But we used try-except to jump over these cases. So this error does not matter.

Qmi3 commented 2 years ago

Thank you very much for your answers and wish you all the best in your work

pengxingang commented 2 years ago

Thank you!