prittt / YACCLAB

YACCLAB: Yet Another Connected Components Labeling Benchmark
BSD 3-Clause "New" or "Revised" License
203 stars 37 forks source link

Error running on ubuntu 20.04 #26

Closed patrickhwood closed 3 years ago

patrickhwood commented 3 years ago

Getting the following error during the GPU run:

+----------------------------------------------------------------------------+ | Checking Correctness of 'PerformLabelingWithSteps()' | +----------------------------------------------------------------------------+ | check: | free(): double free detected in tcache 2 ] 0% | Aborted (core dumped) config.yaml.txt log.txt CMakeCache.txt

prittt commented 3 years ago

Hi @patrickhwood,

sorry for the late reply but we are on vacation right now. You are using CUDA 11.2, right? We will come back to you with a solution as soon as possible. Meanwhile, you can try with an older version of CUDA.

prittt commented 3 years ago

Hi @patrickhwood,

I've updated CI tests and added an environment with Ubuntu 20.04 (CUDA 11.4, nvidia-driver-470, OpenCV 4.4). Your YACCLAB configuration file is a subset of the one used for this test (check it here) and everything seems to work fine. The output log is available here.

Does it fail only on "PerformWithSteps Check" tests? Have you tried to disable this experiment (`eight_connectivity_steps: false ) ? Does it run without errors?

Why are you checking "with step" implementation if you're not testing the algorithms using such a kind of tests?

patrickhwood commented 3 years ago

Thanks. I'll give this a try.

Your questions are good ones. I started by running with the original config file and it took many hours before the error occurred. So I stripped everything out of the file prior to the error and reran the test to verify that it wasn't due to some accumulation. I really didn't go any further than that, as it still took quite a while for the error to pop up.

patrickhwood commented 3 years ago

On a different note, I got unaligned error messages from cuda-memcheck from this line in labeling_CUDA_BKE.cu:

        *(reinterpret_cast<int*>(buffer)) = 0;

I fixed it like this:

    int tmpi = 0;
    char *buffer = reinterpret_cast<char *>(&tmpi);

This pattern also appears in labeling_CUDA_BKE_InlineCompression.cu, labeling_CUDA_BKE_NoInlineCompression.cu, labeling_CUDA_BUF.cu, and labeling_CUDA_BUF_NoInlineCompression.cu.

prittt commented 3 years ago

Ok, I will check it. Could you give us more info about your env? Which GPU are you using? Still CUDA 11.2 and Ubuntu 20.04? What is the driver version?

Thank you.

patrickhwood commented 3 years ago

uname -a Linux pwood-desktop 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0 dmesg | grep NVIDIA [ 0.845230] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 470.57.02 Tue Jul 13 16:14:05 UTC 2021 [ 0.860482] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 470.57.02 Tue Jul 13 16:06:24 UTC 2021 GPUs: RTX 2080ti and RTX 3060

Note that C/C++ doesn't guarantee proper pointer alignment when casting a char to an int, i.e., this is always dangerous and only works on hardware that support unaligned memory access or compilers that guarantee char array buffer alignment to words (not a standard language feature). Some RISC platforms support unaligned access via hardware exception handlers that are quite slow and should be avoided most of the time.

You may want to read https://en.cppreference.com/w/cpp/language/object#Alignment and check out https://en.cppreference.com/w/cpp/language/alignas.

prittt commented 3 years ago

Hey @all-contributors please add @patrickhwood for bug report.

allcontributors[bot] commented 3 years ago

@prittt

I've put up a pull request to add @patrickhwood! :tada: