pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
242 stars 65 forks source link

ValueError in Loading dataset... #22

Open hxu105 opened 1 year ago

hxu105 commented 1 year ago

Hello,

Thank you for sharing this fantastic work, and I have faced some issues reproducing your work. The error statement is in the following, the dataset is not well set up. I am following your instruction to download the dataset archive crossdocked_pocket10.tar.gz and the split file split_by_name.pt from (https://drive.google.com/drive/folders/1CzwxmTpjbrt83z_wBzcQncq84OVDPurM). And extracting the TAR archive.

image

Could you help to fix this issue? Any suggestion will be grateful.

HX

pengxingang commented 1 year ago

Kind of strange to see this error message. It seems that it raised the error when executing train_iterator = inf_itertor(DataLoader(...)). But the output Indexing: 0it before the error message indicated that the program was executing and failed at the _precompute_name2id function (line 90 of utils/datasets/pl.py) which is only called when initializing the training/validation dataset (line 27 of utils/datasets/pl.py) for the first time. Did you make any modifications to the related codes? Besides, I suggest that you can remove the files xxx_processed.lmdb and xxx_name2id.pt (if exist) and rerun the script.

hxu105 commented 1 year ago

Thank you for the response, I want to mention here, the processing stage for setting up data actually skip lots of instances. The picture tells it skipped 183468 instances out of 184057. I consider there might be some errors in the try-except statement. By the way, I just download the repo and keep all the code unchanged. I also try to remove the lmdb file and pt file, but it will face the same error.

pengxingang commented 1 year ago

That might be where the problem is. The raw molecule data is not properly processed. It is abnormal to skip so many instances during the processing. You can check the actual errors by debugging the processing code. It is also possible that some packages are not properly installed.

Octopus125 commented 8 months ago

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.