pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
MIT License
272 stars 73 forks source link

ValueError in Loading dataset... #22

Open hxu105 opened 1 year ago

hxu105 commented 1 year ago

Hello,

Thank you for sharing this fantastic work, and I have faced some issues reproducing your work. The error statement is in the following, the dataset is not well set up. I am following your instruction to download the dataset archive crossdocked_pocket10.tar.gz and the split file split_by_name.pt from (https://drive.google.com/drive/folders/1CzwxmTpjbrt83z_wBzcQncq84OVDPurM). And extracting the TAR archive.

image

Could you help to fix this issue? Any suggestion will be grateful.

HX

pengxingang commented 1 year ago

Kind of strange to see this error message. It seems that it raised the error when executing train_iterator = inf_itertor(DataLoader(...)). But the output Indexing: 0it before the error message indicated that the program was executing and failed at the _precompute_name2id function (line 90 of utils/datasets/pl.py) which is only called when initializing the training/validation dataset (line 27 of utils/datasets/pl.py) for the first time. Did you make any modifications to the related codes? Besides, I suggest that you can remove the files xxx_processed.lmdb and xxx_name2id.pt (if exist) and rerun the script.

hxu105 commented 1 year ago

Thank you for the response, I want to mention here, the processing stage for setting up data actually skip lots of instances. The picture tells it skipped 183468 instances out of 184057. I consider there might be some errors in the try-except statement. By the way, I just download the repo and keep all the code unchanged. I also try to remove the lmdb file and pt file, but it will face the same error.

pengxingang commented 1 year ago

That might be where the problem is. The raw molecule data is not properly processed. It is abnormal to skip so many instances during the processing. You can check the actual errors by debugging the processing code. It is also possible that some packages are not properly installed.

Octopus125 commented 1 year ago

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.

Yuning598 commented 2 months ago

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:嗨,我想我们有同样的问题。在数据预处理过程中,由于此错误,大部分数据被跳过:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.此错误是由numpy版本引起的。我安装了numpy并安装了版本1.22.3,与作者相同。在那之后,我能够正常运行数据预处理。

Yes, it works!!! thx!