kuixu / PrismNet

Predicting dynamic cellular protein-RNA interactions using deep learning and in vivo RNA structure
MIT License
57 stars 12 forks source link

Errors in Training step: RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383 #5

Closed changhaoli closed 1 year ago

changhaoli commented 1 year ago

[changhao.li@grace3 PrismNet]$sh exp/prismnet/train.sh TIA1_Hela clip_data Namespace(arch='PrismNet', batch_size=64, data_dir='data/clip_data', early_stopping=20, eval=True, eval_test=False, exp_name='prismnet', har=False, infer=False, infer_file='', infer_test=False, load_best=False, log_interval=100, lr=0.001, lr_scheduler='warmup', mode='pu', nepochs=200, no_cuda=False, out_dir='exp/prismnet', p_name='TIA1_Hela', pos_weight=2, saliency=False, saliency_img=False, seed=1024, tfboard=False, train=True, weight_decay=1e-06, workers=2) Network Arch: PrismNet

Total params: 58189 Trainable params: 58189 Non-trainable params: 0

train: [0 1] [8002 4000] test: [0 1] [2000 1000] train: [0 1] [8002 4000] test: [0 1] [2000 1000] Train set: 12002 Test set: 3000 THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument Traceback (most recent call last): File "tools/main.py", line 265, in main() File "tools/main.py", line 187, in main t_met = train(args, model, device, train_loader, criterion, optimizer) File "/scratch/user/changhao.li/downloads/PrismNet/prismnet/engine/train_loop.py", line 18, in train output = model(x) File "/scratch/user/changhao.li/.conda/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/scratch/user/changhao.li/downloads/PrismNet/prismnet/model/PrismNet.py", line 81, in forward x = self.conv(input) File "/scratch/user/changhao.li/.conda/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/scratch/user/changhao.li/downloads/PrismNet/prismnet/model/PrismNet.py", line 18, in forward x = self.conv(x) File "/scratch/user/changhao.li/.conda/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/scratch/user/changhao.li/.conda/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383

library version:

packages in environment at /scratch/user/changhao.li/.conda/envs/py3.6:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
ca-certificates 2023.01.10 h06a4308_0
cached-property 1.5.2 pypi_0 pypi certifi 2021.5.30 py36h06a4308_0
cycler 0.11.0 pypi_0 pypi einops 0.4.1 pypi_0 pypi h5py 3.1.0 pypi_0 pypi importlib-resources 5.4.0 pypi_0 pypi joblib 1.1.1 pypi_0 pypi kiwisolver 1.3.1 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
matplotlib 3.3.4 pypi_0 pypi ncurses 6.4 h6a678d5_0
numpy 1.19.5 pypi_0 pypi openssl 1.1.1t h7f8727e_0
packaging 21.3 pypi_0 pypi pandas 1.1.5 pypi_0 pypi pillow 8.4.0 pypi_0 pypi pip 21.2.2 py36h06a4308_0
prismnet 0.1.1 dev_0 protobuf 3.19.6 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.6.13 h12debd9_1
python-dateutil 2.8.2 pypi_0 pypi pytz 2023.3 pypi_0 pypi readline 8.2 h5eee18b_0
scikit-learn 0.24.2 pypi_0 pypi scipy 1.1.0 pypi_0 pypi setuptools 58.0.4 py36h06a4308_0
six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0
tensorboardx 2.6 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi threadpoolctl 3.1.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0
torch 1.1.0 pypi_0 pypi tqdm 4.64.1 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.10 h5eee18b_1
zipp 3.6.0 pypi_0 pypi zlib 1.2.13 h5eee18b_0

Dear @kuixu ,

I am trying to run your software on your test data, but I encountered the above problem during "training" step. Please help me to figure it out. Thank you very much!

Changhao Li