svenevs commented 8 years ago

Recently I've been struggling with some hard to track compiler issues on Fedora 24, and am hesitant to issue a pull request because I'm still not 100% certain I've got everything configured correctly. However, from a high-level point of view, why is this taking place:

#if defined(NANOGUI_GLAD)
    #if defined(NANOGUI_SHARED) && !defined(GLAD_GLAPI_EXPORT)
        #define GLAD_GLAPI_EXPORT
    #endif

    #include <glad/glad.h>
#else
    #if defined(__APPLE__)
        #define GLFW_INCLUDE_GLCOREARB
    #else
        #define GL_GLEXT_PROTOTYPES
    #endif
#endif

1. Why is `GLFW_INCLUDE_GLCOREARB` only requested for `APPLE` and not the `#else` (aka linux)?

If I change things to use it for all unix:

#else
    #define GLFW_INCLUDE_GLCOREARB
    #if !defined(__APPLE__)
        #define GL_GLEXT_PROTOTYPES
    #endif
#endif

everything works out. It seems to make sense to me that it would not compile for linux without this, as the methods / defines in question are

‘GL_GEOMETRY_SHADER’ was not declared in this scope
‘glGetUniformBlockIndex’ was not declared in this scope
‘GL_INVALID_INDEX’ was not declared in this scope
‘glUniformBlockBinding’ was not declared in this scope
‘GL_UNIFORM_BUFFER’ was not declared in this scope

all of these are defined in glcorearb.h on my system, which is exactly what file gets included by GLFW_INCLUDE_GLCOREARB.

2. `GL_GLEXT_PROTOTYPES` is still required to compile, the question being whether or not the corresponding `GLFW_INCLUDE_GLEXT` should be included here.

Why no PR? Well to be honest, given that this has never been here and I've never had issues compiling nanogui on linux before, it seems like a me problem. At the same time, though, where else are these functions coming from and should they be coming from there, or from the local version of GLFW via the GLFW_INCLUDE_*.

I'm more than content to just have my own fork to suit my needs, but in my debugging efforts / realizing what the fix is, it made me question whether or not these should be in here...

Let me know what your thoughts are...

svenevs commented 8 years ago

Ok I just got home and built it on fedora 23 (in a build that has previously worked with all nanogui releases) and am not suffering from the same problems, leading me to believe this is indeed something weird I've done with my fedora 24 installation.

That is, both versions (not requesting ARB core with GLFW for linux, and requesting it) work and build all examples just fine. To me it feels more appropriate to always ask for GLFW_INCLUDE_GLCOREARB for all unix, but there is probably a reason why it was in an #if defined(__APPLE__) to begin with.

wjakob commented 8 years ago

@svenevs:

I think what is happening is that current graphics card drivers (e.g. nvidia) will overwrite glext.h with a much more complete version during installation, which eliminates the need for some of these defines.

Obviously it would be nice to have a codebase that unconditionally compiles everywhere. What would be interesting to me is to see whether you can make a change to the code so that it compiles on both versions of your OS.

svenevs commented 8 years ago

Ugh, you are correct...thanks NVidia. The issue is that there are many things they leave as GL_*_EXT that the original mesa-libGL headers define as both GL_* and GL_*_EXT for compatibility.

In discovering the cause, I don't know that it would be possible to reconcile the two without making significant changes to nanogui. I'll do some more investigating into the differences between the headers, but fc 23 and 24 both compile your current master without issue, assuming mesa headers.

If there isn't a way to get the NVidia gl.h and glext.h combo working easily, this should be abandoned. While the GLFW_INCLUDE_GLCOREARB was functional, I discovered soon after that this is generally a terrible idea on linux. For OSX, it will include OpenGL/gl3.h, and if you define the EXT flag it will grab OpenGL/gl3ext.h. However, on linux, if you ask for ARB you get GL/glcorearb.h. If a nanogui user is working with any library that directly or indirectly includes either GL/gl.h or GL/glext.h, things will explode. This is explicitly claimed to happen at the top of the glcorearb.h file (which begs the question...why does it really even exist?).

So now that I know what's going on here I'll do some more careful diffing between the versions and see if there is an easy patch. I tend to lean toward not supporting NVidia's headers though...

wjakob commented 8 years ago

Ok, that sounds good. Please let me know what you find out.

svenevs commented 7 years ago

Ok did a lot of digging here, I think the amount of shenanigans that would need to happen for this to work out is outrageous. It affects nanovg as well...and ultimately seems to be where OpenGL is lacking in standardization.

Small example of the unwieldy workaround (ugly)

An example of what would need to happen for glutil.cpp:

/* Depending on the operating system, and the version of the OpenGL
 * headers present, some conflicts may occur.  The values are all
 * the same (per operating system), but sometimes the conventions
 * between say the mesa-libGL and NVIDIA OpenGL headers differ.
 *
 * The following is not a comprehensive list of all potential OpenGL
 * conflicts, rather the specific subset NanoGUI needs.
 */
#if !defined(_WIN32) && !defined(__APPLE__)
    // If GL_UNIFORM_BUFFER is not defined, we have NVIDIA headers
    #if !defined(GL_UNIFORM_BUFFER)
        // Redefine a couple of constants
        #define GL_UNIFORM_BUFFER  GL_UNIFORM_BUFFER_EXT
        #define GL_GEOMETRY_SHADER GL_GEOMETRY_SHADER_EXT
        // These are defined in glext.h on mesa-libGL headers, but
        // are only provided in glcorearb.h in the NVIDIA headers.
        #define GL_INVALID_INDEX   0xFFFFFFFFu
        typedef GLuint (APIENTRYP PFNGLGETUNIFORMBLOCKINDEXPROC) (GLuint program, const GLchar *uniformBlockName);
        typedef void (APIENTRYP PFNGLUNIFORMBLOCKBINDINGPROC) (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
        #ifdef GL_GLEXT_PROTOTYPES
            GLAPI GLuint APIENTRY glGetUniformBlockIndex (GLuint program, const GLchar *uniformBlockName);
            GLAPI void APIENTRY glUniformBlockBinding (GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding);
        #endif
    #endif
#endif

In this case, GL_UNIFORM_BUFFER and GL_GEOMETRY_SHADER have an EXT suffix with NVIDIA, and GL_INVALID_INDEX flat out isn't ever defined in gl.h or glext.h. GL_INVALID_INDEX, glGetUniformBlockIndex, and glUniformBlockBinding are only defined in glcorearb.h. There are various functions in nanovg, the Screen class, and probably a bunch of others that would be affected.

The root cause

Maybe this is a legitimate bug on NVIDIA's part, or maybe mesa-libGL is too liberal. The mesa-libGL headers, NVIDIA headers, and OSX headers (I don't have windows so cannot verify) all end up defining the same things, but they are all in different files.

OSX: gl3.h (which is currently included via GLFW)
mesa-libGL: glext.h
NVIDIA: glcorearb.h

In each of these files, the same pattern of

#ifndef GL_VERSION_2_1
#define GL_VERSION_2_1 1
    ... new constants and definitions ...
#endif
#ifndef GL_VERSION_3_0
#define GL_VERSION_3_0 1
    ... new constants and definitions ...
#endif

ends up taking place, but where is crucially important (glcorearb.h being the problem child).

Synopsis

Basically, it seems nobody can agree what the right way to define all of these is. Requesting the CORE_ARB on OSX works because they handle everything internally. Though I could be off base, I think technically the NVIDIA Linux headers are more correct since they are generally more similar to the OSX headers than not.

However, the likelihood of somebody having the NVIDIA headers is relatively small as far as linux users go. Unless they explicitly request for this happen, the NVIDIA driver will not install these (perhaps because they are aware of exactly this issue). As mentioned a couple posts above, though, this is not as easy as asking for CORE_ARB on linux either. Not even NVIDIA programs (CUDA) get it right, and directly ask for gl.h + glext.h. So if glcorearb.h is included directly by NanoGUI, there will be immediate conflicts.

Long way of saying I don't see an elegant way of fixing this, and I think very few users will end up being affected by it overall. I only installed the NVIDIA headers because I thought they would be more comprehensive...

Option A

Maybe it would be better to just do

#if !defined(_WIN32) && !defined(__APPLE__)
    #if !defined(GL_UNIFORM_BUFFER)
        #warning NanoGUI suspects you have NVIDIA OpenGL headers installed.  This will likely not compile, and if so you should (re)install the mesa-libGL headers.
    #endif
#endif

The warning gets fired, and then immediately after it won't compile because GL_UNIFORM_BUFFER is not defined, but then a lot more errors get thrown too. The user will be overwhelmed with red text, see the warning, and at least have a solution available.

Option B

The only other option I see would be to maybe allow a cmake argument to achieve something like

#else
    #if defined(__APPLE__)
        #define GLFW_INCLUDE_GLCOREARB
    #else
        #if defined(NANOGUI_FORCE_GL_COREARB) // <<
            #define GLFW_INCLUDE_GLCOREARB    // <<
        #endif                                // <<
        #define GL_GLEXT_PROTOTYPES
    #endif
#endif

The warning in Option A could then offer two suggestions, and maybe just add a link to the docs online that have a concise explanation of the options for Linux:

Use the mesa-libGL headers and be happy
Attempt to use NVIDIA core arb headers, at the expense of it not compiling if they have external dependencies that just straight include opengl headers in them that are not also core (very likely)

Not really sure what the best option is here, I don't see much room for clever preprocessor hacking, and am equally concerned that making definitions from glcorearb.h available not present in glext.h in the NVIDIA case could potentially lead to some catastrophic errors.

Let me know if you want a PR for option A/B, or if you have any other ideas. I think adding a warning would be nice so that a future user with NVIDIA headers will at least know why it isn't building, but beyond that it's too large in scope to solve in my opinion.

Update: I guess the central theme I'm trying to get at here is that the NanoGUI docs pretty clearly state that a core profile is being requested here. This is definitely true for Mac, most likely true for Windows, and untrue (?) for linux if you are going by the official spec. That said, the linux compatibility profiles seem to be the least error prone. I don't think it's really even worth qualifying it in the docs as it would probably just cause confusion, the only thing being changed in the docs would be explaining what option B above does and potential impacts.

wjakob commented 7 years ago

Thank you for the very comprehensive analysis! That definitely sounds like a tricky situation.

There is potentially one more way to get this to work. On Windows, GLAD is used to guarantee a sane definition of all of the different OpenGL definitions & extensions. I'm wondering if the code compiles okay on your machine without changes if -DNANOGUI_USE_GLAD=ON is specified when running cmake?

svenevs commented 7 years ago

Interesting, the use of GLAD definitely fixes it.

Would you like a PR with "Option A" (warning indicating the user to request GLAD, ultimately followed by compiler errors)?
- I don't think it is possible to save them after-the-fact.
I took a closer look at GLAD, is it time to update it? The current available GLAD online goes up to GL 4.5.

wjakob commented 7 years ago

Sorry for the big delay. Yes, 1. sounds good. Regarding 2: I guess it could be updated if there is a concrete need. I've always used the versions with just the core profile to keep the amount of cruft to a minimum.

svenevs commented 7 years ago

1 can be done. Yeah tabling 2 until somebody comes back with a specific need is probably best.

wjakob / nanogui

include/nanogui/opengl.h treatment of GLFW_INCLUDE_GLCOREARB #156

1. Why is `GLFW_INCLUDE_GLCOREARB` only requested for `APPLE` and not the `#else` (aka linux)?

2. `GL_GLEXT_PROTOTYPES` is still required to compile, the question being whether or not the corresponding `GLFW_INCLUDE_GLEXT` should be included here.

Small example of the unwieldy workaround (ugly)

The root cause

Synopsis

Option A

Option B

wjakob / nanogui

include/nanogui/opengl.h treatment of GLFW_INCLUDE_GLCOREARB #156

1. Why is GLFW_INCLUDE_GLCOREARB only requested for __APPLE__ and not the #else (aka linux)?

2. GL_GLEXT_PROTOTYPES is still required to compile, the question being whether or not the corresponding GLFW_INCLUDE_GLEXT should be included here.

Small example of the unwieldy workaround (ugly)

The root cause

Synopsis

Option A

Option B

1. Why is `GLFW_INCLUDE_GLCOREARB` only requested for `APPLE` and not the `#else` (aka linux)?

2. `GL_GLEXT_PROTOTYPES` is still required to compile, the question being whether or not the corresponding `GLFW_INCLUDE_GLEXT` should be included here.