lzhengning / SubdivNet

Subdivision-based Mesh Convolutional Networks.
MIT License
247 stars 34 forks source link

training error #1

Closed amiltonwong closed 3 years ago

amiltonwong commented 3 years ago

Hi, @lzhengning ,

Thanks for releasing the package. After downloading the dataset, I run the training script and get the following error:

(jittor_new) root@milton-ThinkCentre-M93p:/data/code13/SubdivNet# sh scripts/shrec11-split10/train.sh 
[i 0616 23:07:49.759805 20 compiler.py:847] Jittor(1.2.2.34) src: /root2/anaconda3/envs/jittor_new/lib/python3.7/site-packages/jittor
[i 0616 23:07:49.759901 20 compiler.py:848] g++ at /usr/bin/g++
[i 0616 23:07:49.759969 20 compiler.py:849] cache_path: /root/.cache/jittor/default/g++
[i 0616 23:07:49.762716 20 compiler.py:799] Found /usr/local/cuda/bin/nvcc(10.1.105) at /usr/local/cuda-10.1/bin/nvcc
[i 0616 23:07:49.798470 20 __init__.py:257] Found gdb(7.11.1) at /usr/bin/gdb.
[i 0616 23:07:49.801743 20 __init__.py:257] Found addr2line(2.26.1) at /usr/bin/addr2line.
[i 0616 23:07:49.830711 20 compiler.py:889] pybind_include: -I/root2/anaconda3/envs/jittor_new/include/python3.7m -I/root2/anaconda3/envs/jittor_new/lib/python3.7/site-packages/pybind11/include
[i 0616 23:07:49.838109 20 compiler.py:891] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0616 23:07:50.310062 20 __init__.py:169] Total mem: 11.66GB, using 3 procs for compiling.
Compiling jittor_core(143/143) used: 59.421s eta: 0.000ss
[i 0616 23:08:50.128564 20 jit_compiler.cc:21] Load cc_path: /usr/bin/g++
[i 0616 23:08:50.128595 20 jit_compiler.cc:24] Load nvcc_path: /usr/local/cuda-10.1/bin/nvcc
[i 0616 23:08:51.250901 20 init.cc:54] Found cuda archs: [61,]
[i 0616 23:08:52.635424 20 __init__.py:257] Found mpicc(1.10.2) at /usr/bin/mpicc.
[i 0616 23:08:52.793805 20 compiler.py:654] handle pyjt_include/root2/anaconda3/envs/jittor_new/lib/python3.7/site-packages/jittor/extern/mpi/inc/mpi_warper.h
Compiling jittor_mpi_core(7/7) used: 2.310s eta: 0.000s
[i 0616 23:08:55.261057 20 compile_extern.py:287] Downloading nccl...
Downloading https://github.com/NVIDIA/nccl/archive/v2.6.4-1.tar.gz to /root/.cache/jittor/nccl/nccl.tgz
147456it [00:02, 50709.85it/s]
Compiling gen_ops_mkl_conv_backward_x_mkl_conv_mkl_conv_backward_w_mkl_test_mkl_matmul(3/7) used: Compiling gen_ops_mkl_conv_backward_x_mkl_conv_mkl_conv_backward_w_mkl_test_mkl_matmul(4/7) used: Compiling gen_ops_mkl_conv_backward_x_mkl_conv_mkl_conv_backward_w_mkl_test_mkl_matmul(5/7) used: Compiling gen_ops_mkl_conv_backward_x_mkl_conv_mkl_conv_backward_w_mkl_test_mkl_matmul(6/7) used: Compiling gen_ops_mkl_conv_backward_x_mkl_conv_mkl_conv_backward_w_mkl_test_mkl_matmul(7/7) used: 4.564s eta: 0.000s
[i 0616 23:09:06.842528 20 compile_extern.py:16] found /usr/include/cublas.h
[i 0616 23:09:06.842637 20 compile_extern.py:16] found /usr/lib/x86_64-linux-gnu/libcublas.so
[i 0616 23:09:10.160118 20 compile_extern.py:16] found /usr/local/cuda-10.1/include/cudnn.h
[i 0616 23:09:10.160193 20 compile_extern.py:16] found /usr/local/cuda-10.1/lib64/libcudnn.so
[i 0616 23:09:11.063621 20 compiler.py:654] handle pyjt_include/root2/anaconda3/envs/jittor_new/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
Compiling gen_ops_cudnn_conv_backward_w_cudnn_conv_cudnn_test_cudnn_conv_backward_x(6/9) used: 2.2Compiling gen_ops_cudnn_conv_backward_w_cudnn_conv_cudnn_test_cudnn_conv_backward_x(7/9) used: 2.3Compiling gen_ops_cudnn_conv_backward_w_cudnn_conv_cudnn_test_cudnn_conv_backward_x(8/9) used: 2.5Compiling gen_ops_cudnn_conv_backward_w_cudnn_conv_cudnn_test_cudnn_conv_backward_x(9/9) used: 3.050s eta: 0.000s
[i 0616 23:09:18.184931 20 compile_extern.py:16] found /usr/local/cuda-10.1/include/curand.h
[i 0616 23:09:18.185190 20 compile_extern.py:16] found /usr/local/cuda-10.1/lib64/libcurand.so
[i 0616 23:09:19.741614 20 cuda_flags.cc:26] CUDA enabled.
Traceback (most recent call last):
  File "train_cls.py", line 153, in <module>
    optim = Adam(net.parameters(), lr=args.lr, weight_decay=args.weight_decay)
  File "/root2/anaconda3/envs/jittor_new/lib/python3.7/site-packages/jittor/optim.py", line 251, in __init__
    assert weight_decay==0, "weight_decay is not supported yet"
AssertionError: weight_decay is not supported yet
Caught segfault at address 0x10, thread_name: '', flush log...
stack trace for /root2/anaconda3/envs/jittor_new/bin/python3.7 pid=24435
[New LWP 24441]
[New LWP 25105]
[New LWP 25107]
[New LWP 25108]
[New LWP 25109]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8481e110cb in __GI___waitpid (pid=25371, stat_loc=0x0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
29  ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8482521700 (LWP 24435))]
#0  0x00007f8481e110cb in __GI___waitpid (pid=25371, stat_loc=0x0, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x00007f848075bf3d in jittor::print_trace() () from /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so
#2  0x00007f848075729d in jittor::segfault_sigaction(int, siginfo_t*, void*) () from /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so
#3  <signal handler called>
#4  0x00007f847fb5c7d8 in ?? () from /usr/local/cuda-10.1/lib64/libcudart.so
#5  0x00007f847fb62b01 in ?? () from /usr/local/cuda-10.1/lib64/libcudart.so
#6  0x00007f847fb5598e in ?? () from /usr/local/cuda-10.1/lib64/libcudart.so
#7  0x00007f847fb56006 in ?? () from /usr/local/cuda-10.1/lib64/libcudart.so
#8  0x00007f8481d7eff8 in __run_exit_handlers (status=1, listp=0x7f84821095f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#9  0x00007f8481d7f045 in __GI_exit (status=<optimized out>) at exit.c:104
Undefined command: "py-bt".  Try "help".
Segfault, exit
Segmentation fault (core dumped)

Could you give some hints to solve this issue?

Thanks~

lzhengning commented 3 years ago

Could you please provide your jittor version? If it is outdated, you could update jittor and try again.

lzhengning commented 3 years ago

I find that your jittor version is Jittor(1.2.2.34). The latest jittor should support weight decay in Adam.