jgbit / vuda

VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
MIT License
865 stars 35 forks source link

OS X xcode, EXC_BAD_ACCESS on cudaMalloc() #17

Closed Bambofy closed 4 years ago

Bambofy commented 4 years ago

Hi, I'm using vuda on a macbook air, but when i run cudaMalloc() i get a EXC_BAD_ACCESS error. See the image below of the error.

Code:

#include <iostream>
#include <vector>
#include <memory>
#include <tuple>

#include <raylib.h>

#define VUDA_DEBUG_ENABLED

#include <vuda_runtime.hpp>

#include "GPU/GPUVector.h"
void query_device(void)
{
    //
    //  When querying the number of devices and their properties it is not neccessary to call SetDevice in VUDA
    //

    int deviceCount;
    cudaGetDeviceCount(&deviceCount);
    std::cout << "number of vulkan capable devices: " << deviceCount << std::endl << std::endl;

    for(int dev = 0; dev < deviceCount; ++dev)
    {
        cudaDeviceProp deviceProp;
        cudaGetDeviceProperties(&deviceProp, dev);

        std::cout << "Device " << dev << ": " << deviceProp.name << std::endl;
        std::cout << "    Total amount of global memory:                    " << (uint32_t)(deviceProp.totalGlobalMem / 1048576.0) << "MBytes (" << deviceProp.totalGlobalMem << " bytes)" << std::endl;
        std::cout << "    Total amount of shared memory per block:          " << deviceProp.sharedMemPerBlock << " bytes" << std::endl;

        std::cout << "    Maximum number of threads per block:              " << deviceProp.maxThreadsPerBlock << std::endl;
        std::cout << "    Max dimension size of a thread block (x,y,z):    (" << deviceProp.maxThreadsDim[0] << ", " << deviceProp.maxThreadsDim[1] << ", " << deviceProp.maxThreadsDim[2] << ")" << std::endl;
        std::cout << "    Max dimension size of a grid size    (x,y,z):    (" << deviceProp.maxGridSize[0] << ", " << deviceProp.maxGridSize[1] << ", " << deviceProp.maxGridSize[2] << ")" << std::endl;
        std::cout << "    Integrated GPU sharing Host Memory:               " << (deviceProp.integrated ? "Yes" : "No") << std::endl;
        std::cout << "    Support host page-locked memory mapping:          " << (deviceProp.canMapHostMemory ? "Yes" : "No") << std::endl;

        std::cout << "    Maximum Texture Dimension Size (x,y,z):           1D=(" << deviceProp.maxTexture1D << "), 2D=(" << deviceProp.maxTexture2D[0] << ", " << deviceProp.maxTexture2D[1] << "), 3D=(" << deviceProp.maxTexture3D[0] << ", " << deviceProp.maxTexture3D[1] << ", " << deviceProp.maxTexture3D[2] << ")" << std::endl;
        std::cout << "    Maximum Layered 1D Texture Size, (num) layers:    1D=(" << deviceProp.maxTexture1DLayered[0] << "), " << deviceProp.maxTexture1DLayered[1] << " layers" << std::endl;
        std::cout << "    Maximum Layered 2D Texture Size, (num) layers:    2D=(" << deviceProp.maxTexture2DLayered[0] << ", " << deviceProp.maxTexture2DLayered[1] << "), " << deviceProp.maxTexture2DLayered[2] << " layers" << std::endl;

        std::cout << std::endl;
    }
}
int main()
{
    cudaSetDevice(0);
    query_device();

    int vec[100];
    for (int i = 0; i < 100; i++)
    {
        vec[i] = i;
    }

    int* deviceMemPtr;
    cudaMalloc((void**)(&deviceMemPtr), 100 * sizeof(int));
    cudaFree(deviceMemPtr);

    return 0;
}

Error image: err

The query_device function provided in the examples reports this to console:

2020-05-30 15:37:51.430198+0100 fastdungeon[8013:148919] Metal API Validation Enabled 2020-05-30 15:37:51.462266+0100 fastdungeon[8013:148919] flock failed to lock maps file: errno = 35 number of vulkan capable devices: 1

Device 0: Intel(R) UHD Graphics 617 Total amount of global memory: 8192MBytes (8589934592 bytes) Total amount of shared memory per block: 65536 bytes Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 1024) Max dimension size of a grid size (x,y,z): (1073741823, 1073741823, 1073741823) Integrated GPU sharing Host Memory: Yes Support host page-locked memory mapping: Yes Maximum Texture Dimension Size (x,y,z): 1D=(16384), 2D=(16384, 16384), 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers: 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers: 2D=(16384, 16384), 2048 layers

Program ended with exit code: 9

jgbit commented 4 years ago

Hi, thanks for trying out vuda.

Unfortunately, I don't have a working mac OS environment at the moment - so no real repro possibilities right now. Would you mind trying a couple of things out for me.

  1. run the vulkaninfoSDK.exe and report back the output.
  2. try to run the same code above, but where you replace the function "logical_device::malloc(void devPtr, size_t size)" with `inline void logical_device::malloc(void devPtr, size_t size) { device_buffer_node node = new device_buffer_node(size, m_allocator); (devPtr) = node->key(); push_mem_node(node); }`

I am currently suspecting that the Intel drivers does not satisfy the vulkan specs with respect to memory property flags.

Bambofy commented 4 years ago

Hi, thanks for trying out vuda.

Unfortunately, I don't have a working mac OS environment at the moment - so no real repro possibilities right now. Would you mind trying a couple of things out for me.

  1. run the vulkaninfoSDK.exe and report back the output.
  2. try to run the same code above, but where you replace the function "logical_device::malloc(void devPtr, size_t size)" with `inline void logical_device::malloc(void devPtr, size_t size) { device_buffer_node node = new device_buffer_node(size, m_allocator); (devPtr) = node->key(); push_mem_node(node); }`

I am currently suspecting that the Intel drivers does not satisfy the vulkan specs with respect to memory property flags.

Yes of course no problem, I will try your tips tomorrow. thanks for making this library it's very interesting :)

Bambofy commented 4 years ago

Compiling and executing with g++ gives this error:

richardbamford@MacBook-Air build % ./Program libc++abi.dylib: terminating with uncaught exception of type vk::IncompatibleDriverError: vk::createInstanceUnique: ErrorIncompatibleDriver zsh: abort ./Program

Changing malloc to the code you posted still results in the "IncompatibleDriverError"

Here is the information for a MacBook Air (Retina, 13-inch, 2019) running Intel UHD Graphics 617 1536 MB.

osx_gpu.txt

jgbit commented 4 years ago

Hi,

I got a repro case running on mac OS. The error is related to this issue #14.

I have changed the way memory types and fallbacks are handled in the memory allocator (see the change log). On macOS, memory with HOST_CACHED is not HOST_COHERENT, while this is the case on windows for all three major GPU vendors (so the former implementation had a bug in the fallback procedure). Now the correct invalidate and flushes are in place for this memory type and you should be able to compile the samples.

Thank you.