vetter / shoc

The SHOC Benchmark Suite
Other
243 stars 104 forks source link

Out Of Memory with BusSpeedReadback #49

Open Frenetique opened 8 years ago

Frenetique commented 8 years ago

Hi everyone !

I am using SHOC to characterize some aspects of a board using a Tegra K1 SoC and I ran into an issue with the BusSpeedReadback benchmark with CUDA.

When executing it, the process gets killed by the kernel because of an Out Of Memory situation (OOM). I tried to pinpoint where it actually fails and found the following line:

108: cudaMalloc((void*)&device, sizeof(float) \ numMaxFloats);

The cudaMalloc apparently triggers the OOM mechanic and kills the execution. I am new to CUDA and GPGPU in general, but as I understand it, this line allocates a block of memory on the device side (GPU) memory. The size of the block is roughly 132MB. The GPU has over 1GB of memory so I don't understand the issue.

I was able to try the same benchmark, running of a different system with about the same amount of memory (even less actually) on the GPU, but using OpenCL and got no issues.

Thanks for your enlightments.

Best regards,

Marc

EDIT: BusSpeedDownload runs fine, even though it as an almost identical code structure. All of the other benchmarks also run fine.

EDIT2: I found that OpenCL version does a check of available memory before trying to allocate any memory bloc size. This sets the maximum bloc size. This mechanic does not exist in the CUDA version, therefore allowing the code to allocate memory blocs up to 512MB. In some configurations, this is too much and provokes an OOM situation.

I also found something weird about the CUDA version of BusDownloadSpeed values reported. For a block size of 4096kB, kernel time is: 1.67431ms (mean value). So bandwidth should be 2.5GB/s. But SHOC reports a mean value of 3GB/s for this data size.

I observed the same with the CUDA version of BusReadBackSpeed benchmark. OpenCL version gives coherant values.

Did I miss something ?

Best,

Marc