google / nvidia_libs_test

Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Apache License 2.0
52 stars 21 forks source link

An issue was reported when running cuda-memcheck --tool racecheck #2

Closed gawain102000 closed 5 years ago

gawain102000 commented 5 years ago

When running the commanding "cuda-memcheck --tool racecheck --print-level error --flush-to-disk no --error-exitcode 1 /usr/bin/bazel run //:cudnn_test --action_env=CUDNN_PATH=/home/swqa/.vulcan/install/cuda --action_env=CUDA_PATH=/home/swqa/.vulcan/install/cuda -- --gtest_filter=CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7" on TITAN V, the following issue was reported " [ RUN ] FromFile/ConvolutionTest.CompareResults/CONVOLUTION_FWD_NCHW_TENSOR_OP_52x7x112x4_873x7x3x3_VALID_GetAlgo_v7 F1023 04:04:30.495419 17575 cudautil.cc:92] Check failed: OkStatus() == GetStatus(cudaFree(ptr)) (ok vs. CUDA Runtime API error 'an illegal memory access was encountered') Check failure stack trace: @ 0x186dde0 google::LogMessage::Fail() @ 0x186dd24 google::LogMessage::SendToLog() @ 0x186d675 google::LogMessage::Flush() @ 0x1870aee google::LogMessageFatal::~LogMessageFatal() @ 0x46c42b nvidia_libs_test::DeviceMemory::~DeviceMemory() @ 0x40e9d9 _ZN16nvidia_libs_test12_GLOBAL__N_114RunConvolutionEddRKSt10unique_ptrI12cudnnContextNS_6detail18CudnnHandleDeleterEERKNS_11ConvolutionERKN4absl7variantIJ25cudnnConvolutionFwdAlgo_t29cudnnConvolutionBwdDataAlgo_t31cudnnConvolutionBwdFilterAlgo_tEEE @ 0x410b42 nvidia_libs_test::(anonymous namespace)::ConvolutionTest_CompareResults_Test::TestBody() @ 0x18bf017 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x18ba07f testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x189f35e testing::Test::Run() @ 0x189fc50 testing::TestInfo::Run() @ 0x18a02a5 testing::TestCase::Run() @ 0x18a72a1 testing::internal::UnitTestImpl::RunAllTests() @ 0x18bfd3f testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x18bacb5 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x18a5f0f testing::UnitTest::Run() @ 0x451181 RUN_ALL_TESTS() @ 0x4509e8 main @ 0x7fb41c5ff830 __libc_start_main @ 0x40d639 _start @ (nil) (unknown) ========= CUDA-MEMCHECK ========= RACECHECK SUMMARY: 0 hazards displayed (0 errors, 0 warnings) "

Thanks Bo

gawain102000 commented 5 years ago

Should firstly give some investigating if this is a cuDNN issue. So closed now