pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
242 stars 65 forks source link

Skipping in training #35

Open lajictw opened 8 months ago

lajictw commented 8 months ago

Hi! After unzipping the dataset, I ran the train.py file directly, but it seems that all the data is skipped. I'm not sure if I'm overlooking something. Thank you for your help! A screenshot of my data folder is attached below. image

pengxingang commented 8 months ago

Hi! In the event of an Exception occurring during data processing, the affected data will be skipped. If all the data is skipped, it's possible that your environment might not be correctly configured. To resolve this, you can review and verify your environment setup, delete the 'crossdocked_pocket10.lmdb' file, and then proceed to rerun 'train.py'. If the issue persists, you can catch the Exception during data processing to see what goes wrong.

Loer9999 commented 7 months ago

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protienligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more.

I hope this helps.

lajictw commented 7 months ago

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protienligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more.

I hope this helps.

Thanks for your reply! I tried to replace them but the skipping still occurs. I will try to downgrade the numpy. Anyway, thanks again for your reply

pengxingang commented 7 months ago

Hi! I encountered the same problem when executing sample.py on the testing data. The issue that I found was that one of the utils, protein_ligand.py, used deprecated data types, such as numpy.long of the NumPy module. This resulted in the program skipping most of the data as protienligand.py throws an exception because of the deprecated data types when it tries to process the data. Replacing numpy.long with numpy.longlong, numpy.bool with numpy.bool, numpy.int with numpy.int_ solved this problem for me. I assume using an older version of NumPy will work as well. Check the release notes of NumPy for more. I hope this helps.

Thanks for your reply! I tried to replace them but the skipping still occurs. I will try to downgrade the numpy. Anyway, thanks again for your reply

Thank you for your valuable suggestions. I've addressed the numpy data type warning and made adjustments to the data processing code to handle Exceptions when skipping data. Feel free to pull the updated code to investigate why all the data is being skipped.