Open axmav opened 2 years ago
I created directory preprocess manually and seems like preload_AVA.txt is created, but got another error:
Downloading https://cg.cs.tsinghua.edu.cn/jittor/assets/build/checkpoints/resnet50.pkl to /home/alex/.cache/jittor/jt1.3.1/g++9.3.0/py3.8.12/Linux-5.11.0-3xee/AMDRyzen95900Xx4a/default/cu11.5.50_sm_86/checkpoints/resnet50.pkl
97.7MB [00:17, 6.02MB/s] [w 1028 19:44:09.687694 80 __init__.py:1075] load parameter fc.weight failed ...
[w 1028 19:44:09.687741 80 __init__.py:1075] load parameter fc.bias failed ...
[w 1028 19:44:09.687763 80 __init__.py:1093] load total 267 params, 2 failed
=> Start training #Ep 1 /20
Traceback (most recent call last):
File "/home/alex/anaconda3/envs/menv/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/alex/anaconda3/envs/menv/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/alex/project/hlagcn-jittor/utils_jittor/train_jittor.py", line 237, in <module>
main()
File "/home/alex/project/hlagcn-jittor/utils_jittor/train_jittor.py", line 40, in main
main_worker(args)
File "/home/alex/project/hlagcn-jittor/utils_jittor/train_jittor.py", line 142, in main_worker
loss = train(train_loader, model, criterions, optimizer, epoch, args)
File "/home/alex/project/hlagcn-jittor/utils_jittor/train_jittor.py", line 184, in train
losses.update(loss.item(), input.size(0))
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.item)).
Types of your inputs are:
self = Var,
args = (),
The function declarations are:
ItemData item()
Failed reason:[f 1028 19:25:44.090026 56 helper_cuda.h:126] CUDA error at /home/alex/anaconda3/envs/menv/lib/python3.8/site-packages/jittor/src/var_holder.cc:155 code=700( cudaErrorIllegalAddress ) cudaMemcpy(&data.data, var->mem_ptr, dsize, cudaMemcpyDeviceToHost)
19:25:31->Ep:[1][ 0/227954] - Net:5.9 - Load:2.1 - loss_avg:0.180
[e 1028 19:25:44.351267 56 helper_cuda.h:115] Peek CUDA error at /home/alex/anaconda3/envs/menv/lib/python3.8/site-packages/jittor/src/mem/allocator/cuda_dual_allocator.h:101 code=700( cudaErrorIllegalAddress ) _cudaLaunchHostFunc(0, &to_free_allocation, 0)
Exception ignored in: <function Dataset.__del__ at 0x7fcf97857b80>
Traceback (most recent call last):
File "/home/alex/anaconda3/envs/menv/lib/python3.8/site-packages/jittor/dataset/dataset.py", line 409, in __del__
File "/home/alex/anaconda3/envs/menv/lib/python3.8/site-packages/jittor/dataset/dataset.py", line 211, in terminate
File "/home/alex/anaconda3/envs/menv/lib/python3.8/multiprocessing/process.py", line 133, in terminate
File "/home/alex/anaconda3/envs/menv/lib/python3.8/multiprocessing/popen_fork.py", line 61, in terminate
AttributeError: 'NoneType' object has no attribute 'SIGTERM'
terminate called after throwing an instance of 'std::runtime_error'
what(): [f 1028 19:25:44.661416 56 helper_cuda.h:126] CUDA error at /home/alex/anaconda3/envs/menv/lib/python3.8/site-packages/jittor/extern/cuda/cudnn/src/cudnn_warper.cc:34 code=4( CUDNN_STATUS_INTERNAL_ERROR ) cudnnDestroy(cudnn_handle)
Aborted (core dumped)
Hello! Can not train model on AVA dataset. Get this errors:
Also some images do not exist. I have chosen AVA2 from data split. Thank you!