lzhengning / SubdivNet

Subdivision-based Mesh Convolutional Networks.
MIT License
251 stars 34 forks source link

Cannot run the test shell script #12

Closed unw9527 closed 3 years ago

unw9527 commented 3 years ago

Hello. Thanks for your work. However, when I try to run the test script of coseg-alien, it gives me an error message like this:

name: coseg-alien
0:   0%|                    | 0/37 [00:00<?, ?it/s]/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc(40): error: calling a constexpr __host__ function("floor") from a __global__ function("func_bc2f95c82b48131a_0") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

1 error detected in the compilation of "/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc".
[e 0721 14:21:37.874137 48:C15 parallel_compiler.cc:261] [Error] source file location: /home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc
[e 0721 14:21:37.874176 48:C15 parallel_compiler.cc:264] Compile fused operator(18/56) failed: [Op(0x5561911a3de0:0:0:1:i0:o1:s0,array->0x5561911a3e80),Op(0x5561911a2360:0:0:1:i1:o1:s0,broadcast_to->0x5561911a2410),Op(0x5561911a37f0:0:0:1:i2:o1:s0,binary.mod->0x5561911a30a0),] 

Reason: [f 0721 14:21:37.873882 48:C15 log.cc:387] Check failed ret(256) == 0(0) Run cmd failed: cd /home/xxx/.cache/jittor/default/g++ && /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc '/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc'     -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -I/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -DHAS_CUDA -I'/home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include' -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -lstdc++ -ldl -shared  -x cu --cudart=shared -ccbin='/usr/bin/g++' --use_fast_math  -w  -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -arch=compute_61  -code=sm_61  -o '/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.so'
0:   0%|                    | 0/37 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "train_seg.py", line 162, in <module>
    test(net, test_dataset, writer, 0, args)
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/__init__.py", line 257, in inner
    ret = func(*args, **kw)
  File "train_seg.py", line 64, in test
    preds = np.argmax(outputs.data, axis=1)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.data)).

Types of your inputs are:
 self   = Var,

The function declarations are:
 inline DataView data()

Failed reason:[f 0721 14:21:38.068584 96 parallel_compiler.cc:316] Error happend during compilation, see error above.

Any ideas on why this happens? I have downloaded the data of coseg-alien via the shell script provided.

lzhengning commented 3 years ago

Hi @unw9527 ,

Could you please upgrade jittor and clean the cache by rm -r ~/.cache/jittor ?

If there are still problems, please let me know.

unw9527 commented 3 years ago

Hi @lzhengning , Thanks for your reply. I did what you said above and now jittor's version is 1.2.3.73 (originally 1.2.3.71). And I clean the cache as well. But it gives me another error message now, as follows.

nvcc fatal   : Value 'c++14' is not defined for option 'std'
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor_utils/__init__.py", line 152, in do_compile
    return cc.cache_compile(cmd, cache_path, jittor_path)
RuntimeError: [f 0721 21:17:00.171152 12 log.cc:387] Check failed ret(256) == 0(0) Run cmd failed: cd /home/xxx/.cache/jittor/default/g++ && /usr/local/cuda/bin/nvcc /home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src/misc/nan_checker.cu      -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -I/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -DHAS_CUDA -I'/usr/local/cuda/include' -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -I/home/xxx/.cache/jittor/default/g++  -O2  -x cu --cudart=shared -ccbin='/usr/bin/g++'   -w  -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -c  -o /home/xxx/.cache/jittor/default/g++/obj_files/nan_checker.cu.o
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_seg.py", line 10, in <module>
    import jittor as jt
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/__init__.py", line 18, in <module>
    from . import compiler
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/compiler.py", line 1106, in <module>
    compile(cc_path, cc_flags+opt_flags, files, 'jittor_core'+extension_suffix)
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/compiler.py", line 93, in compile
    jit_utils.run_cmds(cmds, cache_path, jittor_path, "Compiling "+base_output)
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor_utils/__init__.py", line 193, in run_cmds
    for i,_ in enumerate(p.imap_unordered(do_compile, cmds)):
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
RuntimeError: [f 0721 21:17:00.171152 12 log.cc:387] Check failed ret(256) == 0(0) Run cmd failed: cd /home/xxx/.cache/jittor/default/g++ && /usr/local/cuda/bin/nvcc /home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src/misc/nan_checker.cu      -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -I/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -DHAS_CUDA -I'/usr/local/cuda/include' -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -I/home/xxx/.cache/jittor/default/g++  -O2  -x cu --cudart=shared -ccbin='/usr/bin/g++'   -w  -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -c  -o /home/xxx/.cache/jittor/default/g++/obj_files/nan_checker.cu.o

I found that this might be caused by the low version of CUDA. After I ran the command python3 -m jittor_utils.install_cuda suggested by Jittor team, this error message was gone and it gave me an error message like before:

[i 0722 13:02:24.137653 08 compiler.py:869] Jittor(1.2.3.73) src: /home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor
[i 0722 13:02:24.143181 08 compiler.py:870] g++ at /usr/bin/g++(5.4.0)
[i 0722 13:02:24.143247 08 compiler.py:871] cache_path: /home/xxx/.cache/jittor/default/g++
[i 0722 13:02:24.155516 08 install_cuda.py:37] cuda_driver_version: [11, 2]
[i 0722 13:02:24.161784 08 __init__.py:286] Found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc(11.2.152) at /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc.
[i 0722 13:02:24.214301 08 __init__.py:286] Found gdb(7.11.1) at /usr/bin/gdb.
[i 0722 13:02:24.221481 08 __init__.py:286] Found addr2line(2.26.1) at /usr/bin/addr2line.
[i 0722 13:02:24.239643 08 compiler.py:958] py_include: -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m
[i 0722 13:02:24.258129 08 compiler.py:960] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0722 13:02:24.422251 08 compiler.py:1098] OS type:ubuntu OS key:ubuntu
[i 0722 13:02:24.423282 08 __init__.py:178] Total mem: 62.83GB, using 16 procs for compiling.
[i 0722 13:02:24.563519 08 jit_compiler.cc:22] Load cc_path: /usr/bin/g++
[i 0722 13:02:24.652271 08 init.cc:55] Found cuda archs: [61,]
[i 0722 13:02:24.666418 08 __init__.py:286] Found mpicc(1.10.2) at /usr/bin/mpicc.
[i 0722 13:02:24.704353 08 compiler.py:667] handle pyjt_include/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/mpi/inc/mpi_warper.h
[i 0722 13:02:24.724936 08 compile_extern.py:347] Downloading nccl...
[i 0722 13:02:24.785298 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/cublas.h
[i 0722 13:02:24.797011 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcublas.so
[i 0722 13:02:24.797106 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcublasLt.so.11
[i 0722 13:02:25.036328 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/cudnn.h
[i 0722 13:02:25.056544 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn.so.8
[i 0722 13:02:25.056619 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_ops_infer.so.8
[i 0722 13:02:25.059087 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_ops_train.so.8
[i 0722 13:02:25.059688 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_cnn_infer.so.8
[i 0722 13:02:25.083144 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcudnn_cnn_train.so.8
[i 0722 13:02:25.096104 08 compiler.py:667] handle pyjt_include/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0722 13:02:25.351712 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include/curand.h
[i 0722 13:02:25.374880 08 compile_extern.py:20] found /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/lib64/libcurand.so
[i 0722 13:02:25.400630 08 cuda_flags.cc:26] CUDA enabled.
name: coseg-alien
0:   0%|                         | 0/37 [00:00<?, ?it/s]/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc(40): error: calling a constexpr __host__ function("floor") from a __global__ function("func_bc2f95c82b48131a_0") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

1 error detected in the compilation of "/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc".
[e 0722 13:02:28.692090 60:C8 parallel_compiler.cc:261] [Error] source file location: /home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc
[e 0722 13:02:28.692354 60:C8 parallel_compiler.cc:264] Compile fused operator(18/56) failed: [Op(0x55ab6354f100:0:0:1:i0:o1:s0,array->0x55ab6354e9a0),Op(0x55ab6354dcf0:0:0:1:i1:o1:s0,broadcast_to->0x55ab6354d5c0),Op(0x55ab6354e300:0:0:1:i2:o1:s0,binary.mod->0x55ab6354e390),] 

Reason: [f 0722 13:02:28.691857 60:C8 log.cc:387] Check failed ret(256) == 0(0) Run cmd failed: cd /home/xxx/.cache/jittor/default/g++ && /home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/bin/nvcc '/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.cc'     -std=c++14 -Xcompiler -fPIC  -Xcompiler -march=native  -Xcompiler -fdiagnostics-color=always  -I/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/src -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -I/home/xxx/anaconda3/envs/subdivnet/include/python3.7m -DHAS_CUDA -I'/home/xxx/.cache/jittor/jtcuda/cuda11.2_cudnn8_linux/include' -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -lstdc++ -ldl -shared  -x cu --cudart=shared -ccbin='/usr/bin/g++' --use_fast_math  -w  -I'/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/extern/cuda/inc'  -arch=compute_61  -code=sm_61  -o '/home/xxx/.cache/jittor/default/g++/jit/_opkey0:array_T:int32__JIT:1__JIT_cuda:1__index_t:int32___opkey1:broadcast_to_Tx:int32__DI...hash:bc2f95c82b48131a_op.so'
0:   0%|                         | 0/37 [00:08<?, ?it/s]
Traceback (most recent call last):
  File "train_seg.py", line 162, in <module>
    test(net, test_dataset, writer, 0, args)
  File "/home/xxx/anaconda3/envs/subdivnet/lib/python3.7/site-packages/jittor/__init__.py", line 257, in inner
    ret = func(*args, **kw)
  File "train_seg.py", line 64, in test
    preds = np.argmax(outputs.data, axis=1)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.data)).

Types of your inputs are:
 self   = Var,

The function declarations are:
 inline DataView data()

Failed reason:[f 0722 13:02:34.479193 08 parallel_compiler.cc:316] Error happend during compilation, see error above.
lzhengning commented 3 years ago

I reproduced this error in the latest jittor. This seems to be a bug that was introduced recently, and will be fixed soon.

Can you try to install jittor by python3.7 -m pip install jittor==1.2.3.48? I have tested this version and it works.

unw9527 commented 3 years ago

It works. Thanks.

lzhengning commented 3 years ago

Closed because the latest jittor has fixed the bugs.

1170300814 commented 1 year ago

no! they are not fix this bug