DIRT: a fast differentiable renderer for TensorFlow
none of 3 egl devices matches the active cuda device #93

callmeray commented 3 years ago

Thank you for opensourcing the great work. I've built and install dirt, but when I run python test/, I got

2021-01-23 12:44:58.500213: F /home/rayu/Projects/HOnnotate/dirt/csrc/gl_common.h:65] none of 3 egl devices matches the active cuda device
Aborted (core dumped)

I'm using conda to set the python environment

callmeray commented 3 years ago

callmeray commented 3 years ago

Result of nvidia-smi -q

pmh47 commented 3 years ago

The particular error is with 'linking' the cuda and opengl contexts. It sometimes happens when a non-nvidia version of libEGL is found -- however your library paths look correct.

Could you paste the output of ls -l /usr/lib*/*/*GL*.

Also, patch DIRT source to give a bit more info on the error: add LOG(INFO) << "eglQueryDeviceAttribEXT returns " << to the beginning of L60 of csrc/gl_common.h, and LOG(INFO) << "eglGetError returns " << eglGetError(); between L60 and L61. Then, rebuild.

callmeray commented 3 years ago

callmeray commented 3 years ago

callmeray commented 3 years ago

The result of python tests/

2021-01-25 10:24:38.106058: I /home/rayu/Projects/HOnnotate/dirt/csrc/gl_common.h:60] eglQueryDeviceAttribEXT returns 0
pmh47 commented 3 years ago

It looks like there's a problem with your nvidia driver's install of GL libraries. There should be a /usr/lib/x86_64-linux-gnu/libEGL_nvidia alongside the _mesa version. Did you install the nvidia driver using the nvidia runfile, or ubuntu's apt package? Uninstalling, then reinstalling with apt, is likely to fix it. Alternatively, it's possible that the nvidia version of libEGL has been placed somewhere unusual -- try searching the entire system for libEGL_nvidia*

callmeray commented 3 years ago

I installed the driver using the nvidia runfile. And there is no libEGL_nvidia* in my system. I'll try apt install later. Thanks again for your quick reply.