openframeworks / openFrameworks

openFrameworks is a community-developed cross platform toolkit for creative coding in C++.
http://openframeworks.cc
Other
9.98k stars 2.55k forks source link

Bugfix for threads hanging on exit #8148

Closed ofTheo closed 4 weeks ago

ofTheo commented 1 month ago

Would love some eyes on this.

It's a very small change which seems to fix several issues I have noticed where threaded classes will hang on exit.

Moving the exit() function of ofMainLoop from the ofMainLoop destructor to the end of the loop() function which stops looping and returns when the app is closing seems to fix the issue.

I think having exit() in the ofMainLoop destructor was too late and was interfering with the timing of threads closing.

This is an issue I have been battling for years now, especially with cameras like Vimba and Orbbec / Kinect which run threads as part of their capture process.

artificiel commented 1 month ago

does this fix the problem if the thread is holding for longer than 1/60s? (in the sense that, is this catching by chance some problems, but not all?).

i have been toying around ofThread not calling waitForThread upon destruction: currently the destructor of ofThread does nothing, and the docs say:

Exiting Nicely: As a thread is running in parallel with your application main loop, it's important to remember to tell it to stop before exiting the app. If you don't, you'll get weird errors or hangs because you aren't being nice to your threads. Depending on how you started your thread (blocking or non-blocking mode), you will either stop it for wait for it to finish. See the stopThread() & waitForThread() functions.

So that’s a suggestion, but easy to overlook (yet «important to remember»), or not necessarily evident if subclassing ofThread. in #4809 Arturo wrote that (in the context of ThreadChannel but it’s a similar point):

also not sure about calling close on destruction, i think you usually want to have full control over the thread shutdown if you leave it to RAII it's can be problematic if you aren't very careful, for example some members in the class might have been already destroyed when the channel is closed and the thread might try to access them so it's better to explicitly close the channel and wait for the thread on the destructor of your class

Which seems to imply the idea is to put things under the user’s responsability, with a default (non-action) behaviour that’s broken.

The proposal would be that, calling waitForThread() in the destructor if the thread is still running is better than not calling it, as not calling it ends up in «weird errors or hangs», which is not dramatic in itself if the application is quitting anyway, but not clean either. To be clear: if the ofThread destructor is called, it means the object is about to be gone, and in the context of a subclass, it is being called after the subclass’ destructor, so at that point if things are dangling they certainly won’t get better.

So this proposal «leaves it to RAII» by default (actually SBRM), unless the thread is not running, which means the user took over management by calling stopThread or waitForThread (or never started the thread).

Would look like (maybe put a timeout in case of a real problem which would warrant crash)

    virtual ~ofThread() { 
        if (isThreadRunning()) waitForThread();
    }
artificiel commented 1 month ago

this class crashes at shutdown, but not if the ~ofThread is adjusted as above to self-close (the time interval in there is just so the "latency" of shutdown can be appreciated; with the ~ofThread change, the spinning ball of death shows up for a few seconds, then everything exits cleanly)

class Hanger: public ofThread {
    std::atomic<float> expire_ {};

public:
    Hanger() {
        push();
        startThread();
    }

    void push() {
        expire_ = ofGetElapsedTimef() + 5;
    }

    void threadedFunction() {
        while (isThreadRunning()) {
            while (ofGetElapsedTimef() < expire_) { } // will make the app spin at shutdown but still cleanly
            ofLogNotice("pulse");
            push();
        }
    }
};
artificiel commented 1 month ago

of course the above will not fix another layer of threading that might need deeper signalling, but should catch a lot of casual errors-by-omission.

ofTheo commented 1 month ago

Thanks @artificiel I can confirm even with my fix this hangs.

But doing this produces no hang:

//--------------------------------------------------------------
void ofApp::exit(){
    hanger->waitForThread();
}

The classes I have been testing with that this PR addresses probably involve multiple threads some of which are internal to the libraries I am using. So this PR definitely fixes those issues.

I'd be open to a PR with your proposal - weird that we currently don't have a destructor or copy constructor being handled even if they are marked as default.

The proposal would be that, calling waitForThread() in the destructor if the thread is still running is better than not calling it, as not calling it ends up in «weird errors or hangs», which is not dramatic in itself if the application is quitting anyway, but not clean either. To be clear: if the ofThread destructor is called, it means the object is about to be gone, and in the context of a subclass, it is being called after the subclass’ destructor, so at that point if things are dangling they certainly won’t get better.

100% agree!

roymacdonald commented 1 month ago

My 2 cents. Besides saying that thread management can be a real pain, I think that the purpose of ofThread's destructor being non-virtual was precisely that developers had to be in charge of dealing with it. I really cant tell where it might be but I recall this being discussed long ago and there was a good reasoning behind it. Testing against a single thread is kinda trivial but it gets really nasty when you deal with multiple threads, shared resources, mutexes, conditions, etc. all this leads into you having to be really careful about how you deal with the joining of the threads. so this leaves you in the case where you could have 2 different threads waiting on closing the same thread and each blocking the other one to achieve so or in some data race, which can happen really easily if the destructor is doing so. I cant give an exact example but I do recall the headache it was to deal with multithreaded stuff and getting it to close properly. At the end of the day, being super tidy and conscious about it was what going on and when to destroy.

Also, last but not least, there is the order of destruction of objects, in particular static ones, which might lead to get it to destroy too late (also, can't recall specifics but I do remember having to deal with it).

In short, I think it is the developer who should decide when to properly close a thread. @ofTheo Is it possible to get in touch with @arturoc and get his feedback about this issue?

roymacdonald commented 1 month ago

also, mainLoop's exit gets called here. So, not sure what gets triggered first.

danoli3 commented 1 month ago

Pretty sure I fixed this in the thread example when I added the transparency example for glfw see the code in that. Is that merged

On Sat, 19 Oct 2024 at 10:44 am, Roy Macdonald @.***> wrote:

also, mainLoop's exit gets called here https://github.com/openframeworks/openFrameworks/blob/master/libs/openFrameworks/app/ofAppRunner.cpp#L235. So, not sure what gets triggered first.

— Reply to this email directly, view it on GitHub https://github.com/openframeworks/openFrameworks/pull/8148#issuecomment-2423377958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGK2HEM7Y6GMVFHN7SCBU3Z4GMOXAVCNFSM6AAAAABQCLS3L2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGM3TOOJVHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ofTheo commented 4 weeks ago

@roymacdonald yeah I believe the one in the loop gets called first I left that one in as it basically does nothing if exit() has been called already so figured just incase it catches an unusual use case, I would leave it as is. Before we had one in the destructor and that one there, now we have one when the main loop stops and that one there.