ingowald / optix7course

Apache License 2.0
452 stars 80 forks source link

a Segment error (core dumped) error in example02_pipelineAndRayGen #30

Closed jqdsn closed 2 years ago

jqdsn commented 2 years ago

Hello Prof. Wald, I try to run example02_pipelineAndRayGen,there was a Segment error (core dumped) error in optixDeviceContextCreate(cudaContext, 0, &optixContext) function of SampleRenderer.cpp.

configuration: os:ubuntu20.04 driver: 510.60 graphics card:GeForce RTX 3090 cuda11.2 optix7.4

output: `base) zzq@zzq-P920:~/optix7course-master/build$ ./ex02_pipelineAndRayGen

osc: initializing optix...

osc: found 1 CUDA devices

osc: successfully initialized optix... yay!

osc: creating optix context ...

osc: running on device: NVIDIA GeForce RTX 3090

osc: running on device:

段错误 (核心已转储) //a Segment error (core dumped) SampleRenderer.cpp code: void SampleRenderer::createContext() { // for this sample, do everything on one device const int deviceID = 0; CUDA_CHECK(SetDevice(deviceID)); CUDA_CHECK(StreamCreate(&stream));

cudaGetDeviceProperties(&deviceProps, deviceID);
std::cout << "#osc: running on device: " << deviceProps.name << std::endl;

CUresult  cuRes = cuCtxGetCurrent(&cudaContext);
if( cuRes != CUDA_SUCCESS ) 
  fprintf( stderr, "Error querying current context: error code %d\n", cuRes );
  std::cout << "#osc: running on device: "<< std::endl;
OPTIX_CHECK(optixDeviceContextCreate(cudaContext, 0, &optixContext));
OPTIX_CHECK(optixDeviceContextSetLogCallback
            (optixContext,context_log_cb,nullptr,4));

}`

ingowald commented 2 years ago

Interesting. I've just tried on OptiX 7.4 (same version) with my 3080TI (almost same GPU), and also on ubuntu - no problem.

One thing that springs to mind is the Kanji for the output from "running on device:" - i'm wondering if that breaks something, and the problem is related to that rather than to context creation. Could you insert a few printfs right after the context create and setlogcallback, to see where it actually crashes?

jqdsn commented 2 years ago

thank your reply. I insert the a few printfs after the context create and setlogcallback,I'm sure that the error happened in the optixDeviceContextCreate(cudaContext, 0, &optixContext) function.but I don't know that the error is relate to the Kanji for the output from "running on device or context creation.I try to slove the problem of the Kanji for the output from "running on device ,but I don't succeed.the meaning of the Kanji for the output from "running on device is a Segment error (core dumped) error.I try to run the Optix7.2.0,there was the same error.

SampleRenderer.cpp code: `void SampleRenderer::createContext() { // for this sample, do everything on one device const int deviceID = 0; CUDA_CHECK(SetDevice(deviceID)); CUDA_CHECK(StreamCreate(&stream));

cudaGetDeviceProperties(&deviceProps, deviceID);
std::cout << "#osc: running on device: " << deviceProps.name << std::endl;
CUresult  cuRes = cuCtxGetCurrent(&cudaContext);

if( cuRes != CUDA_SUCCESS ) 
  fprintf( stderr, "Error querying current context: error code %d\n", cuRes );
      std::cout << "#osc: running on device: " <<std::endl;
      std::cout << "#osc: running on device: " <<std::endl;
OPTIX_CHECK(optixDeviceContextCreate(cudaContext, 0, &optixContext));
std::cout << "#osc: running on device: " <<std::endl;
OPTIX_CHECK(optixDeviceContextSetLogCallback
            (optixContext,context_log_cb,nullptr,4));

}`

Output: `(base) zzq@zzq-P920:~/optix7course-master/build$ ./ex02_pipelineAndRayGen

osc: initializing optix...

osc: found 1 CUDA devices

osc: successfully initialized optix... yay!

osc: creating optix context ...

osc: running on device: NVIDIA GeForce RTX 3090

osc: running on device:

osc: running on device:

段错误 (核心已转储)`

jqdsn commented 2 years ago

Hello Prof. Wald This is more detailed explanation about the error of a Segment error (core dumped),I get it from the core files. core files Output: BFD: warning: /home/zzq/optix7course-master/build/core-ex02_pipelineAn-457163-1652088086 is truncated: expected core file size >= 51789824, found: 21143552 [New LWP 457163] [New LWP 457166] [New LWP 457165] [New LWP 457164] Cannot access memory at address 0x7f1e793ef168 Cannot access memory at address 0x7f1e793ef160 Failed to read a valid object file image from memory. Core was generated by./ex02_pipelineAndRayGen'. Program terminated with signal SIGSEGV, Segmentation fault.`

jqdsn commented 2 years ago

I have solved the error,I update the driver version 510.60 to 470.it is working.