jgbit / vuda

VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
MIT License
865 stars 35 forks source link

Assertion `diffkey >= 0` failed in Binary Search tree #12

Closed antoniotorresm closed 4 years ago

antoniotorresm commented 4 years ago

I'm getting the following error in my simple VUDA program (only moves some arrays to GPU, launches a kernel and retrieves the arrays back):

vuda/state/binarysearchtree.hpp:298: NodeType* vuda::detail::bst<NodeType, KeyType>::search_range(NodeType*, KeyType) const [with NodeType = vuda::detail::default_storage_node; KeyType = void*]: Assertion `diffkey >= 0' failed.

This failed assertion is not triggering on every execution, around 30% of the times I launch my project it runs without any problem. Any lead on what this could be related to?

jgbit commented 4 years ago

The assertion is there to ensure that the tree of pointers to virtual device memory perform as intended. If it breaks on that something is likely to be very bad. Would it be possible for you to narrow down your code and provide a small example that can reproduce the error?

antoniotorresm commented 4 years ago

Made a minimal example that reproduces the error, here it is:

#include "vuda/vuda.hpp"

int main() {
    int n = 256;
    vuda::setDevice(0);

    float *arr = new float[n];
    float *dev_arr;
    float *dev_arr_out;

    vuda::malloc((void**) &dev_arr, n * sizeof(float));
    vuda::malloc((void**) &dev_arr_out, n * sizeof(float));

    for (size_t i = 0; i < n; i++) {
        arr[i] = 0.0f;
    }

    vuda::memcpy(dev_arr, arr, n * sizeof(float), vuda::memcpyHostToDevice);
    vuda::memcpy(arr, dev_arr_out, n * sizeof(float), vuda::memcpyDeviceToHost);    
}

Try running it a few times. Sometimes it will run without any problems, and sometimes it will throw the error reported.

jgbit commented 4 years ago

Unfortunately, I am unable to reproduce it. Which system and compiler are you using?

antoniotorresm commented 4 years ago

I am using Arch Linux with GCC 9.2.0. I get the error in two different systems, one using AMD with radv and the other one using NVIDIA's propietary drivers.

jgbit commented 4 years ago

I finally managed to reproduce the error and have pushed a fix. I have long been troubled by the use of pointer subtraction in that function, since ptrdiff_t is really only meant to use for pointers to elements of the same array. However, in this instance it turned out the error was due to the a lower order precision abs() function. The fix removes the use of abs and ptrdiff_t entirely, i.e. both of these pitfalls in search_range.

Thank you for your feedback. It was a good catch. Let me know if the fix works for you.

antoniotorresm commented 4 years ago

Looks like it's working now. Thank you so much!