mosra / magnum

Lightweight and modular C++11 graphics middleware for games and data visualization

https://magnum.graphics/

Other

4.75k stars 439 forks source link

Async GL shader compilation #576

Closed dranikpg closed 2 years ago

dranikpg commented 2 years ago

534

dranikpg commented 2 years ago

This is still a WIP, so no worries about incomplete doc comments, undefined constants for GLES targets, formatting, etc. 😄

While implementing the proposed changes for Shader&Program is no more than splitting code into two functions, handling compile states requires some planning.

Compilation states, I suppose, should be only data bags. All logic should be in FooGL::compile or the final constructor. This way is is much cleaner and easier to manage. However AbstractShaderProgram constrainedness stands in the way. A shader program has to be created and submitted for linkage in FooGL::compile or the compilation state. Besides that, typical FooGL::compile requires a program to exist early on to call bindAttributeLocation/bindFragmentDataLocation, which have to be arranged strictly before linkage. So it'd make sense fully to reuse AbstractShaderProgram, which supports all operations.

I came up with the following solution and implemented it for FlatGL:

I assumed that AbstractShaderProgram (ASP) should remain abstract with protected members. After all, that was its intention. But to reuse its functionality, we will make ASP::CompileState extend it.
FlatGL::ComplieState will extend ASP::CompileState as planned.
FlatGL::ComplieState will declare FlatGL::compile and FlatGL(CompileState&&) as friend, to allow access to ASP functions.
FlatGL(CompileState&&) will simply move the ASP out of the compile state, to initialize its base class (which is another ASP)
All common public faced helper functions (like isLinkFinished) will be in ASP::CS

                          -- submit/check/... 
                          |
                          v
                       +++++++++++++   
                       +    ASP    +  <-------- moves out of
                       +++++++++++++          | to init its ASP
                            ||                |
                       +++++++++++++          |
FooGL::compile         +  ASP::CS  +       FooGL(FooGL::CS&&)
       |               +++++++++++++          |
       |                    ||                |
       |               +++++++++++++          |
       ------------->  + FooGL::CS +  <-------- friend to 
creates & returns      +++++++++++++            use ASP functions 
friend

This solution is compact and doesn't change the way ASP works, however it feels a bit tricky to me...

Things to consider when generalizing for more than FlatGL:

Instead of using friend, we can reimplement all required functions for ASP::CS. More boilerplate, but less tricky?
We can make ASP::CS store vertex&fragment shaders and call attach in constructor. However submitLink still has to be called manually after all bind calls.
Instead of calling bind & submitLink manually, we can pass a function/templatize by a functor, which will be called after creation, but before linkage. This way FooGL::compile wont contain any ASP specific logic

What are your thoughts on this?

mosra commented 2 years ago

Hi, apologies for a delayed response here -- I'm stuck deep in a certain other feature and its taking way more time than expected.

Thanks for the code, and especially for the detailed description of your thought process. Very appreciated :+1:

I'll post a more detailed reply once I manage ship that other feature (hopefully later this week), but after a brief look at your initial changes it seems my original propsal wasn't exactly great in terms of the amount of boilerplate inflicted on subclasses. (My fault, sorry.) I have to think more about that. The ideal state would be that subclasses wouldn't need to implement that much extra apart from what they did before (because all duplicated state would also mean twice as much testing effort), only splitting the initial setup somewhat, perhaps at the cost of some safety in case the async compilation is used. Which is fine I guess, it's an advanced feature after all.

I have a vague idea that the construction could get split via a chain of delegating constructors, which would either directly call into each other in the straightforward case, or an object gets "partially constructed", waited upon, and then moved to a fully-constructed state once the compilation finishes. I'll outline that better once I get my hands free.

Until then, in 74f1778759d226fc3464f6adb045f370f4c14206 I added support for KHR_parallel_shader_compile at least, so you don't have to patch it in by hand. (It caused a conflict -- sorry.)

dranikpg commented 2 years ago

Just throwing this in as as suggestion before you get back to it. Instead of having separate compile states like described in the issue, we might be just fine with one generic one that immediately constructs the ASP, but guards it unitl its not compiled.

template<typename FooGL>
struct CompileState {
private: // no access to shader until compiled
  FooGL asp; // moved in from FooGL::compile
  Shader vert, frag;
  optional<Shader> geom; // or parameterize by N - number of shaders?
}

then

FooGL(CompileState<FooGL>&& cs)

blocks and finishes consturction. All common functions can be handled by the compile state directly. Further more, there will be no duplicated fields like in the previous case

mosra commented 2 years ago

Okay, that actually seems like the best middle ground :)

The delegating constructors from my comment above wouldn't work on its own because there would be nowhere to store the temporary GL::Shader instances, while OTOH putting them into the shader program itself would be wasteful. But combined with your suggestion it could work:

class FlatGL::CompileState: public FlatGL {
    private:
        friend FlatGL;
        using FlatGL::FlatGL; // low-effort way to get access to the NoCreate constructor to populate this
        GL::Shader frag, vert;
        GL::Version version; // + whatever else the constructor needs to preserve to avoid duplicated code
};

It's publicly derived, so the getter APIs like isLinkFinished() or flags() are easy to access if user code needs to. (Which means the dangerous APIs like draw() are accessible too, but that's fine, the same danger would be with a NoCreate'd instance.) I'm not 100% sure if it should be an inner class or defined externally as FlatGLCompileState, but having it as an inner class avoids further polluting docs of the Shaders namespace at least. It's also not a template because I think it's simpler to just type out all extra state you'd need (for compute shaders there would need to be Optional<GL::Shader> compute, etc, getting very ugly very fast).

Then, there would be FlatGL::compile() returning FlatGL::CompileState, and FlatGL(CompileState&&) that finishes construction, like you said, and like you already mostly have in your code.

Additionally:

I would make isLinkFinished() public in AbstractShaderProgram, to avoid a need for an AbstractShaderProgram::CompileState -- for a finalized instance it would always return true which is useless but not dangerous, so it doesn't need to be hidden from the public API.
I would make the submitCompile() / submitLink(), checkCompile() / checkLink() APIs operate just on a single shader/program, instead of a list. The list was there originally as a non-ideal parallelization attempt, and since you have a much better alternative now, calling submitCompile({a, b}) doesn't really have any advantage over a.submitCompile(), b.submitCompile() (besides avoiding some temporary allocations, but that's my problem to solve). The original list-taking compile() / link() would stay (and sequentially call into the submit/check APIs) for backwards compatibility for 3rd party code.

Does this make sense? I hope I didn't forget something critical again :) And apologies for the delay.

dranikpg commented 2 years ago

Makes sense and should work 😄 I'll try this out next week

dranikpg commented 2 years ago

I sketched the proposed changes. If it looks acceptable, I'll move on to tidying and implementing the same for other shaders.

FlatGL needs an additional constructor that will be called from its compile state. I suggest

protected:
/** @brief Construct without running shader compilation and linking */
FlatGL(NoInitT, Flags flags, etc.)

that will create the ASP and set variables, but not trigger compilation (or else we're stuck in a loop 😆). NoInit seems to be the closest from the available tags.

besides avoiding some temporary allocations, but that's my problem to solve

I saw the containers and it made me not go for the split initially. There is a todo to implement stack based allocation, I can try this out in this PR.

dranikpg commented 2 years ago

As it turns out, querying GL_LINK_STATUS might return success before the completion query does so 😕 I've discovered this when I was updating the tests

This means that the following is possible:

FlatGL2D s{{}, 1,1}; // calls checkLink internally
s.isLinkFinished(); // returns false

I've made a small example that works in this odd way on my pc:

Renderer: AMD RENOIR (DRM 3.41.0, 5.13.0-52-generic, LLVM 12.0.1) by AMD
OpenGL version: 4.6 (Core Profile) Mesa 21.2.6

I'm not sure if this is the norm, because I couldn't find anything in the khronos docs on whether it can't be used after linking. In my case, it even returns false when the shader is being used - it seems like the driver just needs some time to notify its completion query mechanism 🤔

Update: I've made a raw OpenGL example to make sure the result is not influenced by magnum in any way. Same behaviour: LINK_STATUS returns GL_TRUE, but querying completion does not

dranikpg commented 2 years ago

I'll give it another look when you get to the tests

@mosra just a friendly ping in case you didn't see my message above. No hurries if you're busy 😅 😓 I implemented the proposed test changes. If they look ok, I'll move to the other shaders. However, the is a small issue with the query correctness. I've described it above

mosra commented 2 years ago

I saw the notifications but didn't find time yet, sorry for the wait again. I'll try to check this later today or tomorrow.

dranikpg commented 2 years ago

Ah, nice, so the sleep was enough to fix that strange issue :)

Well, it does it for the tests 😅 The API still doesn't work "as expected", but i've left a warning in the doc comments

if there will be any minor issues left I'll happily do that post-merge

I'll have a pass at the end and try to fix as much as I can. It's just that every time I forget how something is supposed to be aligned, I open a random file and look how its done there 😄 I know there is Corrades styleguide, but it doesn't cover cases like line-breaking a function with 5 arguments and 3 ifdefs

The linux-nondeprecated build is failing

Just didn't get to it. Thanks for the build option, I have been fishing them out of the logs. I'm more worried about AppVeyour... It spits out like 2k lines of warning in other parts of magnum and then the linker fails. There seems to be something wrong with the templated shaders (because MeshVisualizerGL2D is not part of the error). Specifically, the FooGL(Flags) constructor seems to be unresolved (after being moved to the header).

I don't have a dev-ready windows pc right next to me to test on, so it'll be cool if you could tell me the cause if you now it right away. In case this requires a little investigation, you can leave it on me.

mosra commented 2 years ago

I'll have a pass at the end [...] line-breaking a function with 5 arguments and 3 ifdefs

:bow: thank you! For line breaking, it's comments on 79 cols (mainly just to make them easy to read, unless there's an excessively long identifier or expression that can't really be broken). For other stuff, having the whole signature as a single long line is fine. Preprocessor #ifdefs in signatures I indent with 4 extra spaces compared to surrounding code.

If I may, the only other thing I noticed is a space before :, I have it always without, be it a class base or a range-for. But again, don't worry too much, I can fix it up after. I can't stand the mess clang-format makes so there's no automated way and thus the formatting burden is on me.

then the linker fails

It complains about a vftable, which is a vtable, which is due to AbstractShaderProgram having a virtual destructor. My theory is that, IIRC, the vtable usually gets attached to the first or last method declaration in the class, or something like that. So if the definition is inline and it's combined with the fact that it's a template, then it could happen that the vtable isn't stored in the DLL. Or something, heh.

TL;DR: try to deinline the explicit FlatGL(NoInitT) {} and others, since these are the last methods in the class (i.e., make a = defaulted definition of them in the .cpp files), maybe that makes it go away?

I don't have a Windows machine around either, so I usually just abuse AppVeyor, pushing random tries until it starts cooperating. Feel free to do the same, haha.

dranikpg commented 2 years ago

I guess we don't need full checks in constructors

FooGL(CompileState&&) {
    CORRADE_INTERNAL_ASSERT_OUTPUT(checkLink());
    CORRADE_INTERNAL_ASSERT_OUTPUT(cs._vert.checkCompile());
    CORRADE_INTERNAL_ASSERT_OUTPUT(cs._frag.checkCompile());

because in case the first check fails, the other two fill be skipped. In case the first is successful, the others are guaranteed to be the same

mosra commented 2 years ago

Okay, finally -- merged this as 96f97d4e7a28700b595067820a60b9c3e93281e8 and eb17d775625439292eab029d71ce658b4d2ae143, and wrote some introductory docs in 4580c30d1e4fbe1cef744d980a137c8fe0923344 and 48326ac418590fa89a4d5ee21231bb61ad0c865e.

What I only realized when testing locally was that the linker error message alone doesn't contain anything useful if the compilation failed -- somehow I falsely remembered that it contains also the compilation errors. So to provide the full context in case of a failure, as of c9d7365937fa702c9a96f8a79e3d20b9f7e243cb the checkLink() now accepts also a list of the input shader instances, and it calls checkCompile() on those first. That felt like the easiest way to achieve this while keeping it as async as possible, compared to forcing the shader code to wait on / check for individual shader compilations. Finally, in eede67175546407ec1973773116fc52b68c0ddd0 I made a change to not need to include GL/Shader.h in all shader headers (which pulls in <vector> and other nasty STL stuff). I have to fix the root cause properly (#499), but until I find time to do that I didn't want to increase compile times for everybody.

But those are all just minor additions on top. Thanks again for this work, and sorry for the extreme delays! :+1: