Using this command nvprof --profile-child-processes bash autotune_conv_float.sh 512 512 7 filter_bg4.npy to profile the sparse convolution will report illegal memory access when I set checkCudaErrors(cuEventSynchronize(stop)); in this line.
==19226== NVPROF is profiling process 19226, command: ./exe
> Using device 0: Tesla V100-PCIE-16GB> GPU Device has SM 7.0 compute capability0
picked algorithm: 6
Workspace size: 25.002MB
baseline used 0.163843
0.104157
direct used 0.0002848
0
32 1 49
CUDA Driver API error = 0700 from file <sparsednn/driver_conv.cu>.
Difference: nan
However, SparseRT works well when I run bash autotune_conv_float.sh 512 512 7 filter_bg4.npy directly.
> Using device 0: Tesla V100-PCIE-16GB> GPU Device has SM 7.0 compute capability0
picked algorithm: 6
Workspace size: 25.002MB
baseline used 0.164115
0.104157
direct used 0.0002848
0
32 1 49
kernel used 0.0733642
0.104157
Difference: 0.00319038
Environment:
Host: Ubuntu-16.04.5, GCC-5.4.0
Device: Tesla V100-PCIE-16GB, CUDA-10.2, cuDNN-8.0
Besides, I have also tested nvprof --profile-child-processes bash autotune_conv_float.sh 512 512 7 filter_bg4.npy in CUDA-10.0. It does not report such errors, but it also does not display any mm kernel information. Here is the nvprof result in CUDA-10.0.
Using this command
nvprof --profile-child-processes bash autotune_conv_float.sh 512 512 7 filter_bg4.npy
to profile the sparse convolution will report illegal memory access when I setcheckCudaErrors(cuEventSynchronize(stop));
in this line.However, SparseRT works well when I run
bash autotune_conv_float.sh 512 512 7 filter_bg4.npy
directly.Environment: Host: Ubuntu-16.04.5, GCC-5.4.0 Device: Tesla V100-PCIE-16GB, CUDA-10.2, cuDNN-8.0
Besides, I have also tested
nvprof --profile-child-processes bash autotune_conv_float.sh 512 512 7 filter_bg4.npy
in CUDA-10.0. It does not report such errors, but it also does not display any mm kernel information. Here is the nvprof result in CUDA-10.0.