jrmadsen / PTL

Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task queue
MIT License
40 stars 13 forks source link

Non-use of perfect forwarding in TaskGroup run/exec? #49

Closed drbenmorgan closed 1 month ago

drbenmorgan commented 1 month ago

In preparing some guides for Geant4 developers/users on the use of PTL in that code, I ran into a slightly un-intuitive (but not necessarily wrong...) gotcha in the use of TaskGroup. Take the rather contrived case of:

#include "PTL/AutoLock.hh"
#include "PTL/TaskGroup.hh"
#include "PTL/Threading.hh"
#include "PTL/Types.hh"

#include <iostream>
#include <vector>

class ExpensiveToCopy
{
public:
    ExpensiveToCopy() { log("constructor"); }
    ~ExpensiveToCopy() { log("destructor"); }
    ExpensiveToCopy(const ExpensiveToCopy&) { log("copy-constructor"); }
    ExpensiveToCopy(ExpensiveToCopy&&) { log("move-constructor"); }
    ExpensiveToCopy& operator=(const ExpensiveToCopy&)
    {
        log("copy-assignment");
        return *this;
    }
    ExpensiveToCopy& operator=(ExpensiveToCopy&&)
    {
        log("move-assignment");
        return *this;
    }

    void log(std::string_view msg) const
    {
        std::cout << this << ": " << msg << std::endl;
    }
};

void
pass_by_constref(const ExpensiveToCopy& val)
{
    val.log("pass_by_constref called!");
    return;
}

int
main()
{
    ExpensiveToCopy y;
    PTL::TaskGroup<void> group;
    group.exec(pass_by_constref, y); // passes y by value...
    // this is pass by ref: group.exec(pass_by_constref, std::cref(y));
    group.join();
}

As in the comment, whilst pass_by_constref takes a constant reference, exec doesn't deduce this automatically as the parameter pack is passed by value (and consequently there are several copies/moves before the final call). std::ref/cref or a lambda with capture-by-reference do result in the expected behaviour.

The implementations of exec are obviously not so trivial, so I wondered if this prevented use of perfect forwarding of arguments (Ala make_unique) or whether this could be supported in future developments (I'm happy to have a go at it if so!)?

jrmadsen commented 1 month ago

IIRC, I initially did do perfect forwarding here but then, during my own development using the API, I found it to be way too easy to make the relatively simple mistake of allowing the const ref/ref to go out of scope before the task completed, e.g. using your example:

int
main()
{
    PTL::TaskGroup<void> group;
    {
        ExpensiveToCopy y;
        group.exec(pass_by_constref, y); 
    }
    group.join();
}

This led to numerous heisenbugs since pass_by_constref(y) will almost certainly (in this example in particular) get executed while the main thread is at group.join(). Sometimes, the stack memory used by y might not be overwritten so it still "works", but other times, part or all of the stack memory gets overwritten... leading to lots of undefined behavior. In more complex apps, realizing this was the issue was very unintuitive.

To avoid the expense of the copies, the solution is actually quite simple:

int
main()
{
    ExpensiveToCopy y;
    PTL::TaskGroup<void> group;
    group.exec([&y]() { pass_by_constref(y); });
    group.join();
}
jrmadsen commented 1 month ago

Ah, sorry. I missed that you noted the solution in your original message:

lambda with capture-by-reference do result in the expected behaviour.

jrmadsen commented 1 month ago

IIRC, this particular issue is why std::thread has the similar restrictions (capture-by-ref lambda and/or std::ref/std::cref wrappers)

drbenmorgan commented 1 month ago

Thanks for the details @jrmadsen ! After reading a bit more on the std::thread restrictions it does make more sense. Given PTL's interface matches these, that makes things simpler, and in fact having to explicitly state the pass behaviour in a lambda capture or wrapper likely leads to cleaner code (even if developers complain)!