stellar / stellar-core

Reference implementation for the peer-to-peer agent that manages the Stellar network.

https://www.stellar.org

Other

3.12k stars 970 forks source link

design cleanup: Work interface is a mess #1755

Closed MonsieurNicolas closed 5 years ago

MonsieurNicolas commented 6 years ago

It's quite unclear for implementors what the various methods (onReset, onStart, etc) are really supposed to do, especially when combining Work (what WorkParent does for example).

For example, I would expect implementations to follow:

onReset initializes the Work to a pristine state (does NOT create work)
- this is a common method that is called before start, retry but also failure
- right now there is no "reset" function that can safely be called regardless of the reason
onStart should be the one creating the initial batch of work (also called on retry as retry triggers in sequence reset, advance, start)

This is illustrated by WorkParent that doesn't manage its children at all: it's up to all implementors to remember to call clearChildren() on reset for example (or face undefined behavior on retry).

MonsieurNicolas commented 6 years ago

I think that some of the implications of this are to ensure that advance is the place that centralizes decisions on state transition and "run" would create more work if needed.

marta-lokhova commented 6 years ago

I did a bit of investigation of this looking for patterns and how the code it trying to use Work, so here are some thoughts:

First of all, just for clarity, I'm adding Work state machine represented as a diagram. Below are allowed state transitions we currently have:
Currently, decision of state transition is in two places: in advance and in onSuccess. It seems like onSuccess was designed to manipulate state. And implementors heavily utilize that: for example, ApplyBucketsWork goes back and forth between "RUNNING" and "PENDING" states (depending if it's done applying a level or not). So I don't know if it makes sense to centralize decisions in advance only.
The WorkParent-Work relationship is not very intuitive to me; specifically, its separation of responsibilities. I feel like there are two different notions here:
1. a state machine that transitions based on certain rules (e.g. what was returned in onSuccess, or advance; with no dependencies advance is trivial)
2. a parent that has children and manages them; its state transition rules are therefore more complex, as it needs to check children before proceeding (e.g. now advance has some rules) To illustrate this point, I'll go back to what Work is trying to accomplish, from the code:
3. May depend on other Work before starting
4. May have parts that can run in parallel, other parts in serial
5. May need to be broken into steps, so as not to block the main thread
6. May fail and need to retry, after some delay
```
While 1 and 2 look to me like direct responsibilities of `WorkParent`, 3 and 4 are managed by `Work`. Note that some works in our code don't even need all of this. [ApplyLedgerChainWork](https://github.com/stellar/stellar-core/blob/06c5bac64f9865a707b79038141db52ad1db4f33/src/catchup/ApplyLedgerChainWork.cpp) is a good example: all it needs is a guarantee of 3 and 4. 
```
Our codebase also has some patterns:
1. Actual "work" is done in onStart, while onRun is overridden to be empty, e.g. VerifyBucketWork or RunCommandWork Is there a particular reason to do it like that?
2. Early version of "gating": onSuccess enforces order of work executed. PublishWork illustrates this: it first adds a child ResolveSnapshotWork and goes into PENDING mode until the Work is done. In a similar fashion, it does WriteSnapshotWork and PutSnapshotFilesWork. This same pattern can be also found in CatchupWork and FetchRecentQsetsWork.
3. This is more like an anti-pattern; addWork is used liberally pretty much anywhere. Work is added in onReset (BatchDownloadWork), onSuccess (described above) or in both (GetAndUnzipRemoteFileWork) Because of the the resulting work's flexibility, It became really difficult to track each implementer. Additionally, changing anything in the Work interface becomes dangerous, because it's really hard to assess how the change affects each variation on work. So some questions that come up to mind: Are we over-using work interface? It Work interface trying to achieve too much? Should we define a better "work-work parent" relationship?

MonsieurNicolas commented 6 years ago

Multi-part work

Let's use ApplyBucketsWork as a way to discuss what is going on here for the RUNNING state as it's actually a fairly simple Work that doesn't spawn children.

What this does:

onReset resets the overall Work (this seems to be the right thing to do)
onStart initializes a few members, preparing to apply the "current level"
onRun is used as a way to execute a unit of work, scheduling "success" at the end
onSuccess schedules run (if the current level is not done), start (if the current bucket is done and it's not done with all levels) or nothing (ie it's done)

The problem I see here is that it's actually abusing the way Work deals with state transition: there is no reason to have the outer loop (go over levels) and the inner loop (apply buckets for a level) be implemented in different ways: those should be implementation details on how ApplyBucketsWork manages what to do while it's "running".

State definitions and callbacks

We need to clearly define what each state is supposed to be. Right now I don't know what WORK_PENDING really means vs WORK_RUNNING and what an implementor is supposed to really do in the callbacks (which should allow us to have proper separation of concern and centralize different aspects of scheduling/processing).

Per my comment for a simple work, I think that the only difference between WORK_PENDING and WORK_RUNNING is that onStart gets called (but right now it's used for other purpose in advance).

An example on how we can force people to implement things properly: for onReset, we can call it before any important transition, including when we tear down Work, if we do this right, I think we can stop calling clearChildren in all destructors (and instead assert in Work::~Work if there are children still present).

Work with children

I think that we can make working with children a lot cleaner with some changes to the state definitions.

I think that we could make everything work fine by just using WORK_RUNNING even when we're waiting for all children to complete (notify would not be used) but this would be inefficient (busy loop) as we would be constantly scheduling work to check if children are done.

So, instead, I think there should be a way for onRun to tell the executor that they want to wait for a child to complete before being scheduled again (maybe introduce a new state WORK_WAITING for this? it would also integrate neatly with event based work like RunCommand). I don't know if we need anything else than that (like we don't need a "wait for all children to complete") as done in the current implementation of advance: it's very possible that instead it gets replaced by a helper function that gets called at the beginning of onRun by implementors if they really want to wait for all children (but to your point, some people want to run things in parallel, some want to sequence, etc). There might be a need for another helper function to help fail if any children failed fatal for example. Pattern would be something like:

ExecutionStatus MyWork::onRun() {
    // helper function returns:
   // DONE: all children done, no errors
  //  RUNNING: some children are still running
  //  FAILURE_*: some children failed
    auto status = checkAllChildrenDone();
    if (status == DONE)
    {
        // do something else
    }
    else
    {
         return status;
    }
}

With this, I think notify is just doing something like scheduleRun of the parent (forcing decisions to be made by run).

Work vs WorkParent

I agree the separation of concern is not great on this one.

I am not sure what the original intent of WorkParent was:

it doesn't seem to have a "parent" member (can't notify its parent, unlike Work), yet it seems to be the class that deals with a "hierarchy" of work.
it seems to track state but only for its children, which is very weird

marta-lokhova commented 6 years ago

Work FSM

The problem I see here is that it's actually abusing the way Work deals with state transition: there is no reason to have the outer loop (go over levels) and the inner loop (apply buckets for a level) be implemented in different ways: those should be implementation details on how ApplyBucketsWork manages what to do while it's "running".

This makes me wonder: why do we even allow work to put itself back into PENDING state (on the graph, that would be running->pending state transition). Sure, we could change the current code to not abuse this transition, but it'll be hard to force future implementors to follow. It also seems like this state transition allowed for all kinds of different hacks in the code. Maybe Work shouldn't be allowed to re-start itself? Sounds like the rule of thumb should be "once you start work, you either stay in running state, or transition into a completed {success or failure} state", so there's no way back.

Per my comment for a simple work, I think that the only difference between WORK_PENDING and WORK_RUNNING is that onStart gets called (but right now it's used for other purpose in advance).

yes, that is my understanding as well. Though onStart itself is confusing, because implementors do all kinds of things in onStart (including actual work -- still not sure what's the rationale behind that)

for onReset, we can call it before any important transition, including when we tear down Work, if we do this right, I think we can stop calling clearChildren in all destructors (and instead assert in Work::~Work if there are children still present).

Are you assuming here the guarantee we discussed before that core won't shutdown until all work is in completed state? Can we also define "important transitions" that you are talking about? Transitioning into a completion state and retrying might be good candidates. Not sure if calling onReset would still make sense when we add new work.

Let's try to define clear guidelines on which callbacks should be used for what: onStart - Initialize necessary variables, prepare Work for execution. onRun - do actual work, which might include adding children and start waiting for them. onSuccess - transition into failure or success state, or schedule next chunk of work to run. Is onSuccess the right place to return new state? Perhaps it could be onRun? onReset - reset Work to the initial state (should be the same as when we first added the work)

Children management & scheduling

So, instead, I think there should be a way for onRun to tell the executor that they want to wait for a child to complete before being scheduled again (maybe introduce a new state WORK_WAITING for this?

I don't know if I agree on this one. This sounds to me like:

Executor becomes a centralized place that makes decisions on what is going to be run based on circumstances (e.g. waiting for child work to finish)
If so, it's kind of like an OS scheduler with priorities (in our case, children have higher priority hence they are executed first). So executor now needs to keep track of a queue of work and react properly when children are done. While it makes work interface simpler, it seems like it would come at a cost of executor complexity. I'm not sure that abandoning notification system in favor of this one won't introduce more problems.

Work & WorkParent

It is also weird that mChildren in WorkParent are of type Work and not WorkParent(tree structure)
So maybe Work should be the base class, and not know about children. Instead, it would focus on proper state transitions. Meanwhile, WorkParent manages children while also utilizing Work's functionality.

MonsieurNicolas commented 6 years ago

Work FSM

I agree that it may make sense to ditch the onStart callback.

I also think that nobody should be calling SetState outside of the Work class.

Maybe Work shouldn't be allowed to re-start itself?

That's a good point: I think the only reason one needs to "restart itself" is to just wait for some outside event (and for that, a callback with for example a timer should be used), any other restart should be managed as a failure (that kicks off a restart).

Not sure if calling onReset would still make sense when we add new work.

onReset would only be called when starting a new work or when tearing down work (failure/abort or shutdown)

onSuccess - transition into failure or success state, or schedule next chunk of work to run. Is onSuccess the right place to return new state? Perhaps it could be onRun?

Mmm, yeah onSuccess should not be called when running, I think the problem right now is that it's called when children complete. I think it's much easier to turn onRun into the central place where implementors decide what to do next during execution.

I think onSuccess, to your point, should only be called when Work transitions to SUCCESS (to unblock other work for example) and not allow it to change state. The reason it's nice to have it as its own callback is that it clearly separates code that should happen when done (same with onFailure).

Also, if we do this, then onRun is the only one returning the desired new state.

Children management & scheduling

Executor becomes a centralized place that makes decisions on what is going to be run based on circumstances (e.g. waiting for child work to finish)

no, the executor does not care about what should be run: it always calls "run". The only reason we need this WAITING state is to avoid a busy loop: when in "RUNNING" state, the executor has to schedule "run".

The simplest way to think about this is that we first class support for waiting for a callback (triggered by a timer for example). As we first class callbacks, we can provide a wrapper that is the only way to invoke a Work instance, that will basically do something like:

enforce that state == WAITING
set state to RUNNING
schedule run

btw, I noticed something strange in the code base: I see that we have a few places that do something like

mApp.getWorkManager().advanceChildren();

That seems fishy to me: why would a child have to "kick" the work manager? It's probably a work around for a hack of sorts...

Work & WorkParent

So maybe Work should be the base class, and not know about children. Instead, it would focus on proper state transitions. Meanwhile, WorkParent manages children while also utilizing Work's functionality.

Yes, maybe with some points of clarification:

base class responsible for all state transitions, with support for generic callback for waiting defined here
derived class WorkParent, adding the notion of hierarchy (ie, it's the one with addWork)
- maybe override and expose new methods that implementors need to override?

If someone needs to add child work, they can simply derive from the WorkParent or whatever we want to call it.

With all this: is there a reason to have the notify functionality? The problem I see with notify as it is now is that it's completely separate from run (which then requires people to perform all sorts of dance to schedule things before or after children complete). Maybe a way to fix this is when deriving WorkParent implementors get to implement a onRun callback that has an optional Work (maybe an vector even?)? They don't get to implement onRun as we would mark it "final" in WorkParent.

marta-lokhova commented 6 years ago

Ok, from the offline discussion today, looks like I needed to make some changes to the design doc I've been working on that I'll post here -- it's almost done. Meanwhile, some clarifications:

With the round-robin approach to scheduling (specifically, where we schedule one work at a time), I wonder how this plays with batching. I guess it shouldn't be any different, except there might be other tasks scheduled in between batching works.
If onRun becomes the only place that sets state, it should also probably be the only place where addChild is allowed (more specifically, an implementer gets to implement a function inside onRun that does actual work, and might add children). We probably shouldn't add children on places like onStart, onSuccess etc

btw, I noticed something strange in the code base: I see that we have a few places that do something like

mApp.getWorkManager().advanceChildren();

I think this comes from a fact that when work is added, nothing kicks it off, it's just kind of sitting there until advance it called. So if something is added to the work manager (top level), it needs a manual kick. This sort of makes sense, because it allows for construction of a work tree, before scheduling anytning. I suppose we could change that since in the new design added work goes straight into RUNNING mode. Thus it would be guaranteed to start running right away, so that no manual maintenance is needed.

marta-lokhova commented 6 years ago

High-level

Goals

Reduce complexity of Work interface (or at least abstract it away from the implementers, so that they are forced to use work class as intended and are limited in their actions)
Control IO service and don't let it overwhelm the main thread (in theory, this is what happens now, but in practice, because work is allowed to post itself to main IO service and is not throttled, it could potentially cause starvation for other tasks)
To achieve this, we’d simplify the interface to continuously run and schedule one unit of work at a time

What is provided by the work interface

All state transitions logic (work is also in charge of setting state)
Work scheduling logic as well as throttling of work
Ability to create work hierarchies

What is required from the implementers

Provide work logic (e.g. applying buckets, publishing snapshots) as well as hint the next desired state (returned in doActualWork)
Construct an appropriate tree structure

Interface

Disclaimer: we need better names for these 3 components: WorkFSM, Work (think WorkFSM that manages children), WorkManager

WorkFSM & Work (child of WorkFSM) state = {RUNNING, WAITING, SUCCESS, FAILURE_RETRY, FAILURE_RAISE, FAILURE_FATAL}

Note that there's no PENDING state - any added work goes directly into RUNNING mode, which would mean it's either waiting for its children or actually running the work.
new WAITING state is there to indicate that work shouldn't be scheduled yet. It would integrate with RunCommandWork: currently it schedules its completion itself; with the new logic, we would provide a waiting callback that implementers could use, to set work back into RUNNING state so that work manager can schedule it again.
I think we can deprecate onStart method - its no different than onReset with the new logic, so onReset could just handle variables reset/assignment

Here are some descriptions of the methods we'd use:

virtual void onReset()
- resets work to its initial state; We should call this every time we add new work, retry, fail or shutdown (in a destructor); this way we can guarantee that derived classes would be cleaned up properly without implementor caring about it. This also means that we shouldn't be adding "clearChildren()" to every derived class destructor.

void Work::State onRun()

This method is responsible for scheduling next child to run and provide next desired state (I imagine most of the time if there are still children running nextState for current work would just be RUNNING)

{
// see explanations below for new methods
auto nextChild = yieldNextRunningChild(); 
if (nextChild)
{
   nextChild->run();
}
auto nextState = doActualWork(); 
if (nextState == DONE)
{
   complete(nextState);
}
else
{
   setState(nextState);
   notifyWorkManager();
}
}

std::shared_ptr yieldNextRunningChild()
- This method would be used to round-robin child scheduling (in onRun, basically if children are not finished yet, we would get the next child and tell it to run again)
- For this we can use an iterator that is re-initialized in onReset
- Note that WAITING children would not be scheduled
- Note: A potential issue I see with this RR approach is that some works wouldn't get any priorities - if we ready to treat every work as equal priority then that should be okay (though it's probably still better than now, as it's unpredictable when work is going to be scheduled)
Work::State void doActualWork();
- Implementors provide actual work logic (need to find a better name). Additionally, if work needs to wait for children first, it can use existing functionality to check for children and return RUNNING state to wait for them.
notifyWorkManager is basically scheduling a run of top level work which would then decide which work to schedule next according to RR logic.

Below are methods for scheduling running and completion of work

void 
run()
{
    if (state != WAITING)
   { 
         // schedule yourself to run by posting a callback to IO service
         // There might be additional stuff, but the callback would set state to `RUNNING` and call `onRun`
    }
}

void 
complete(State state)
{
    assert(state == DONE) // shouldn’t be called when state RUNNING/WAITING
    if (state == SUCCESS)
    { 
         // similarly to `run`, post to IO service a callback that would set state to SUCCESS then call `onSuccess`
    }
   Else if (state == FAILURE)
    {
     // similarly to `run`, post to IO service a callback that would set state to FAILURE then call `onFailure`
    }  
}

virtual void onSuccess()/onFailureRaise()/onFailureFatal()

Reset work, success/failure bookkeeping (metrics, logging, etc), also use callback provided by WorkManager to notify it that it can trigger run again (react to completion event) - notifyWorkManager

Additionally, as mentioned above, we introduce a new state WAITING that would prevent work from being re-scheduled. Specifically we can introduce "wait" callback that would would be triggered by a timer and actually schedule running of work; So work can have a timer and use async_wait with mWaitingCallback. In this case code would look something like:

void
Work::wakeUp()
{
        assert(getState() == WAITING);
        setState(RUNNING);
    notifyWorkManager(); // let WorkManager know it’s time to schedule run again    
}

std::function<void(asio::error_code ec)> mWaitingCallback
{
    mDone = true;
    mErrorCode = ec
    wakeUp(); // wake up and schedule completion
}

void Work::State doActualWork() 
{
    if (!Started)
    {
        Started = true;
        mTimer.asyncWait(mWaitingCallback);
        return WAITING;
    }
    else if (mDone)
    {
        return mErrorCode ? FAILURE : SUCCESS);
    }
}

std::shared_ptr\<Work> addWork(..args..)
- Note that work parent's children are also WorkParent, it was kind of weird that WorkParent was keeping track of Work in mChildren

WorkManager : public Work (potentially WorkScheduler?)

Should be able to remove finished work tree from its children upon its success. For this we can have a slightly modified method that checks state of children - if it's SUCCESS, that work tree can be removed.
Gets events from the work tree, and issues a run command to next yielded child which recursively propagates it down to whatever work is actually ready to run (i.e. has all children finished).
For this happen, I think we can use the same onRun method as in WorkFSM, except doActualWork would be just a no-op.
One thing I'm realizing, looks like in this design we would have top work constantly running constantly running (even if work manager has no children); I guess it's an unlikely scenario when WorkManager has nothing to do, but something to keep in mind.

MonsieurNicolas commented 6 years ago

In general, I would prefer that you first describe interfaces without any implementation details (ie public methods only).

Then, you can use those interfaces in

mock implementations of specific methods of one of the 3 Work classes
example XYZWork, to illustrate what implementors typically need to do
- Good examples are RunCommand (event based), ApplyBucketsWork (multi-part work) and GetHistoryArchiveStateWork (work with children)

naming

I think they should be: WorkFSM -> BasicWork (or WorkBase) Work stays the same WorkManager, yes WorkScheduler is probably more accurate

interfaces

`BasicWork`

You should start by describing the BasicWork interface.

I would recommend exposing a crank function that internally invokes doRun (if state is RUNNING). That way, we can expand to other state transitions in the future (like doAbort).

I would expect wakeUp to be implemented at this level and not have a dependency on WorkScheduler.

Also, you need to define when people are supposed to call wakeUp.

`Work`

You didn't actually describe the Work interface (well it's all mixed up with implementation detail); I suspect that once you have BasicWork, it will become easier for you to do so.

Also, I imagine that there are Work specific methods that would be declared as part of the interface, things like addWork?

Also, void Work::State onRun() method doesn't make sense to me:

what we discussed earlier was to never set the state directly and return the desired new state instead (this is the responsibility of BasicWork)
it's unclear to me what implementors are supposed to do when they derive from Work

implementation

std::shared_ptr yieldNextRunningChild() This method would be used to round-robin child scheduling (in onRun, basically if children are not finished yet, we would get the next child and tell it to run again)

I would prefer the round robin set to be { child_1, ... child_n, self }, that way we only perform one unit of work at a time (with what you wrote it's possible that we perform work in both a child and through doActualWork

For this we can use an iterator that is re-initialized in onReset

This doesn't make sense to me, why would we call onReset when work is continuously running?

Note that WAITING children would not be scheduled

Not sure I understand "scheduled" here, I think you just mean we don't call crank on WAITING state.

Should be able to remove finished work tree from its children upon its success. For this we can have a slightly modified method that checks state of children - if it's SUCCESS, that work tree can be removed.

I think we should have a clear separation of concern between the 3 classes. The only thing special about WorkScheduler is that it schedules crank calls: removing children sounds like a responsibility of Work (if people want to keep children that completed around, they can just keep references in local member variables)

One thing I'm realizing, looks like in this design we would have top work constantly running constantly running (even if work manager has no children); I guess it's an unlikely scenario when WorkManager has nothing to do, but something to keep in mind.

I am not sure I am following, this is the purpose of the WAITING state

marta-lokhova commented 6 years ago

Okay, will update based on the feedback.

what we discussed earlier was to never set the state directly and return the desired new state instead (this is the responsibility of BasicWork)

Ok, I think it should be more clear once I separate BasicWork and Work

it's unclear to me what implementors are supposed to do when they derive from Work

Like I mentioned earlier, doActualWork would Provide work logic (e.g. applying buckets, publishing snapshots) as well as hint the next desired state (returned in doActualWork)

This doesn't make sense to me, why would we call onReset when work is continuously running?

What I meant is that when we transition into important states like retry or failure, child iterator would be reset.

Not sure I understand "scheduled" here, I think you just mean we don't call crank on WAITING state.

Yes, essentially child in WAITING state gets skipped.

I am not sure I am following, this is the purpose of the WAITING state

I was talking about WorkManager in particular. With no children, if would keep cranking (not the case currently; hence, the manual kick as soon as we add a new work tree to work manager). Based on definitions we discussed, I wasn't sure if WAITING is what we want as WorkManager isn't really blocked on anything.

marta-lokhova commented 6 years ago

BasicWork

public:
    enum State
    {
        WORK_RUNNING,
        WORK_SUCCESS,
        WORK_WAITING,
        WORK_FAILURE_RETRY,
        WORK_FAILURE_RAISE,
        WORK_FAILURE_FATAL
    };

    void reset(); // reset work to its initial state 

private:
   // only BasicWork deals with state transitions; it also controls IO service
    void setState(State newState); 

    void run(); // see tentative implementation below
    void complete(State state); // set appropriate completion state and notify scheduler

    void scheduleRun(); // post callback to `run` to IO service
    void scheduleComplete(); // post callback to `complete` to IO service

    // ...Retries helper functions would also go here; they are very similar to what we have now... // 

protected:  
    State mState{RUNNING};
    // Note that these callbacks are protected
    // I don't think there's a good reason for letting outside classes access them
    virtual void onReset();
    virtual void onFailureRetry();
    virtual void onFailureRaise();
    virtual void onSuccess();

    // Implementers provide work logic used in `onRun`, that hints BasicWork what state to transition to.
    virtual State doActualWork() = 0;
    virtual State onRun( return doActualWork(); );

    std::function<void(asio::error_code ec)> mWaitingCallback {...};
    // whenever implementers decide to return WAITING state, they need to make sure
    // to define an event that re-trigger running (call `wakeUp`). For example, in case of a timer, 
    // implementers can use a generic waiting callback defined above. 
    void wakeUp(); 

    void crank(); // schedule proper action based on state (`run`, `retry`, `abort` etc...)

    // at this level BasicWork would just crank itself, since there is no notion of work 
    // manager/children yet, but this kind of weird so we'll let `Work` implement it.
    virtual void notifyScheduler() = 0;

BasicWork::run would be the method that decides state transitions and could look something like:

void BasicWork::run()
{
    auto nextState = onRun();
    If (nextState == DONE)
    {
        scheduleComplete(nextState); // complete will notify work manager
         }
         else 
         {
                setState(nextState);
            notifyScheduler(); // still running or waiting, tell scheduler to crank again
         }
}

Work : public BasicWork

public:
    template <typename T, typename... Args>
    std::shared_ptr<T>
    addWork(Args&&... args) { ...same as current code... };

protected:
    std::vector<std::shared_ptr<Work>> mChildren;
    std::weak_ptr<Work> mScheduler; // main scheduler that schedules cranks

    // remove completed children, additionally implementers can add new children if needed (e.g. BatchWork)
    virtual void manageChildren(); 
    virtual std::shared_ptr<Work> yieldNextChild(); // if child is returned, call crank on it, otherwise crank self
    State onRun() override final; // do slightly more in `onRun`, as now we manage children
    State doActualWork() override; // may additionally check if children are done
    void notifyScheduler() override
    {
        // get shared_ptr for mScheduler 
    mScheduler->crank();
    }

    // ..various helper methods to check state of children.. //

With this, onRun could tentatively look something like:

State onRun()
{
manageChildren(); // Add more work if needed
Auto child = yieldNextChild();
If (child == nullptr) // run self
{
    return doActualWork(); 
}
else 
{
        if (child->getState != WAITING)
        {
             child->crank();
        }
        return RUNNING;
}
}

WorkScheduler : public Work

I think we all the functionality described above (how WS gets notified and children removed), there isn't much for WorkScheduler to do except for ensuring it's not busy-waiting for an event. My main concern is that we might end up with a lot of redundant work being performed. In a tree-like work structure it works okay, because if a leaf work is WAITING, some other work just gets scheduled instead. But if we have a linked list structure, it gets inefficient. So perhaps WorkScheduler could return WAITING state in doActualWork which then gets unblocked by either addWork or mWaitingCallback. This would work for both cases where scheduler has not children, or all of its children are in WAITING state.

Examples

With this, some of the example works would look tentatively like this:

RunCommandWork similarly to what I mentioned before, use timer and waiting callback in doActualWork, scheduler does not schedule this work while it's in WAITING state.
GetHistoryArchiveStateWork, child work GetRemoteFileWork is checked and added in manageChildren, then in doActualWork mState gets updated given that added child completed successfully.
ApplyBucketsWork, manageChildren is a no-op, doActualWork is where all the fun happens; derived class controls progress of buckets application with some internal variables and returns proper state in doActualWork.

So, implementers have to:

Decide which child to yield next
Decide if new children need to be added when work is running
Decide if they want to wait for all children

Some other questions/concerns:

We don't have a good way to distinguish whether we are RUNNING because we're waiting for children, or because we're waiting for the actual work like "apply buckets" to be done.
I don't know if we should intend BasicWork to just work on its own. It's a little weird with notifyScheduler, since BasicWork is its own scheduler. I previously had notifyScheduler just do crank, but I think it's confusing. Instead, we can have Work implement the method, and have all classes derive from Work.

MonsieurNicolas commented 6 years ago

`BasicWork`

Public interface vs protected

can you expand comments on what the various methods are?

I would like the split between public and protected to be:

Public methods expose core BasicWork functionality. As BasicWork is supposed to be used to perform some work, I would expect the main crank method to fall into that bucket for example. probably a virtual destructor as well.
Protected methods expose what "implementors" should implement (and use in some cases)

`Work`

Same thing: split public vs protected as it's unclear what a user of Work would do vs an implementor.

BasicWork

Public:
    enum State
    {
        WORK_RUNNING,
        WORK_SUCCESS,
        WORK_WAITING,
        WORK_FAILURE_RETRY,
        WORK_FAILURE_RAISE,
        WORK_FAILURE_FATAL
    };

    BasicWork(Application& app, std::string uniqueName, size_t maxRetries, std::function<void(State state)> notifyCallback);
    virtual ~BasicWork();

    State getState();
    // to avoid confusion with VirtualClock::crank, let’s call it `crankWork`
    void crankWork();

Protected:  
    // Note that these are empty by default, I think
    // implementers should not be forced to implement them 
    virtual void onReset() {};
    virtual void onFailureRetry() {};
    virtual void onFailureRaise() {};
    virtual void onSuccess() {};
    virtual State onRun() = 0;

    // Reset is exposed at this level, as `Work` should also be able to reset when adding children
    void reset()
    {
    // .. Work might reset some internal variables here.. //
    onReset(); // whatever else implementers need to do (e.g. clean up children)
    }

    void wakeUp()
    {
    assert(mState == WAITING);
    setState(RUNNING);
    mNotifyCallback(mState);
    }

    // This is a callback that implementers provide that will do whatever needed
    // internally then notify its parent
    std::function<void(State stateOfChild)> mNotifyCallback; 

Private:
    State mState{RUNNING};
    // Note that there are only 2 places where setState is called: crankWork and wakeUp
    void setState(State newState); 

    // Retries helper functions would also go here; they are very similar to what we have now,
    // except I think we could incorporate WAITING state for the retry delay,
    // and wakeUp when it's time to retry
    // Additionally, retry mechanism should be internal to BasicWork

crankWork could look something like this


void crankWork()
    {
    // We shouldn’t crank on work that is finished or waiting
    assert(getState() != DONE && getState() != WAITING); 

    // For now, this is only for running state, we can add if condition for aborting

    auto state = onRun();
    setState(state);

    if (state == FAILURE)
    {   
        switch(state)
        {
            Case RETRY:
                onFailureRetry();
            Case RAISE:
            Case FATAL:
                onFailureRaise();
                }
                reset(); // note work is reset on any failure
        }
        else if (state == SUCCESS)
       {
           onSuccess();
       }

    // completed a unit of work
    mNotifyCallback(state);
    }

What everybody can do: start running work and check on its state What implementers can do: implement onXYZ callbacks to do whatever they like What only BasicWork can do: transitions states

Work : public BasicWork

Public:
     Work(Application& app, std::string uniqueName, size_t retries, std::function<void(State state)> callback) : public BasicWork(app, uniqueName, retries, callback), ... 
    virtual ~Work( ...ensure children are properly cleaned up... ); 

    template <typename T, typename... Args>
    std::shared_ptr<T>
    addWork(Args&&... args)
    {
    Auto child = make_shared<Work>(...);
    addChild(child);
    mNotifyCallback(); 
    }

    // I imagine we could also have a `addBatchOfWork` (with a better name),
    // so that we notify parent once. 

    // ..Various methods to check the state of children.. 

Protected:
        State onRun() override final
    {
        auto child = cleanUpAdnYieldNextChild();    
        if (child)
        {
            child->crankWork();
            return RUNNING;
                 }
                else 
                {
                    return doWork();
                }
        }

         // Implementers decide what they want to do: spawn more children,
        // wait for all children to finish, or perform work 
        virtual State doWork() = 0;

        void onReset() override
        {
            clearChildren();
        }

Private:
    // remove completed children, round-robin next running child (skip `WAITING` children)
    std::shared_ptr<Work> cleanUpAdnYieldNextChild(); 
    std::vector<std::shared_ptr<Work>> mChildren;

   // Update `mChildren`, reset child work 
   void addChild(std::shared_ptr<Work> child);

What everybody can do: add children, check overall children progress What implementers can do: refer to comment for doWork What only Work can do: manage mChildren, round-robin over children

WorkScheduler : public Work
{
public:
    WorkScheduler(Application& app);
    virtual ~WorkScheduler(); 

protected:
        // When there are no children, all they all are in WAITING state, 
        // go into waiting state as well until event notification or new children 
    State doWork() override 
    {
        Return WAITING;
    }
}

A callback for the scheduler would be slightly different, something like:

auto maybeScheduleCrank = [self](State state)
{
    // children could all be in WAITING state, avoid busy loop 
    if (self->anyChildrenRunning()) 
    {
        if (self->getState() == WAITING)
        {
            self->wakeUp();
         }
        else
        {
            IO_service.post(self->crankWork());
        }
    }
}

Additionally, I think we might need a static create method for WorkScheduler. We first need to instantiate scheduler class (I guess with nullptr for mNotifyCallback), then assign a callback to schedule crank (this is not the case for regular child Work because at class instantiation the parent is known, hence mNotifyCallback can be supplied). I also think that once WorkScheduler is properly setup, it should start cranking right away, so that there's no manual trigger required. (which should work fine even when there's no work to do since it'll go straight into WAITING state)

Examples

ApplyBucketsWork

State doWork() override
{
    // customized behavior for this work
    auto finishStatus = applyBuckets();
    if (stillApplyingBuckets())
    {
        return RUNNING;
    }
    else 
    {
        return finishStatus == success ? SUCCESS : FAILURE;
    }
}

GetHistoryArchiveWork

State doWork()
{
    if (!mGetRemoteFile)
    {
        mGetRemoteFile = addWork<GetRemoteFileWork>...;
        Return RUNNING;
    }
    else 
    {
        if (mGetRemoteFile->getState() == SUCCESS)
        {
            loadFile(); 
            return SUCCESS; 
        }
        else 
        {
            // decide how to proceed if child failed; in this particular case, fail too
            return FAILURE;
        }
    }
}

// A callback defined for children could be like this (passed to `GetRemoteFileWork` in this case) 

std::weak_ptr<Work> self = ...get weak ptr…;
auto parentNotify = [self](State state)
 {
    // Do whatever derived class wants to do internally
    // to react to `state`

    // get shared ptr for self
    self->mNotifyCallback(state); // propagate up 
}

jonjove commented 6 years ago

Overall, looks reasonable now. Some questions and objections:

Why does crankWork call mNotifyCallback? crankWork is always called by the parent (except in the case of the WorkScheduler), so the parent must be RUNNING if the child is RUNNING. So there is no reason to notify the parent in this case, since it can and should take any relevant action in response to the state change the next time it is scheduled for crankWork. This question is a consequence of the fact that notification should only be used to wake up a parent.
Can wakeUp assert that the state is WAITING? As a counter example, suppose that new child work is added to an already running parent. The child doesn't necessarily know that its parent is running (because BasicWork has no access to the parent) so it must attempt to wake its parent.
I don't think the pseudocode for Work::onRun captures the right semantics. Work::onRun should round-robin over its children and self, since self may have to respond to the changing state of its children.

I think it would be helpful if you added more comments to the interface which define in what situations any given function can/should be called.

marta-lokhova commented 6 years ago

Several modifications based on the feedback (pseudocode, might need to be simplified):

yeah, I think mNotifyCallback could be removed out of crankWork, and appear on the scheduler level to schedule the next crank. With that, a callback for the scheduler could look something like: auto maybeScheduleCrank = [self](State state)

{
// children could all be in WAITING state, avoid busy loop 
if (self->anyChildrenRunning()) 
{
            // This could happen if `addWork` is called while scheduler is waiting
    if (self->getState() == WAITING)
    {
        self->wakeUp();
     }
    else
    {
        IO_service.post(
            {
                self->crankWork();
                self->mNotifyCallback();
                             });
    }
}
}

I think you are right, and the waiting state is not guaranteed. I also realized that addWork should rather look like this:

template <typename T, typename... Args>
std::shared_ptr<T>
addWork(Args&&... args)
{
Auto child = make_shared<Work>(...);
addChild(child);
wakeUp(); // ensure to set state to RUNNING
}

Given your suggestion, onRun would roughly be:

    // Assuming `mScheduleSelf` is initialized to “false”
    State onRun() override final
{
    If (mScheduleSelf || !hasRunningChildren())
    {
        mScheduleSelf = false;
                     return doWork();   
             }  
             else 
    {
         mScheduleSelf = true;
         auto child = cleanUpAndYieldNextChild();
             child->crankWork();
             return RUNNING;
    }

I'm working on adding more comments as well as an initial implementation.

marta-lokhova commented 6 years ago

@MonsieurNicolas @jonjove

Additional clarifications, and some learnings from the initial implementation of work interface:

I think it's worth describing the notification & scheduling behavior a bit more in depth. As described in the design above, we're using mNotifyCallback as a way for any Work to alert its parent of an event. It appears that in most of the cases so far (in fact, in all but one, which I will cover below) invoking wakeUp on a parent is sufficient. Now, I think we should make a clear definition whether we expect the parent to exist and not be destroyed prior to its children being destroyed. I recall we discussed that aborting logic should give us this guarantee. The reason I'm bringing this up is because it will influence some implementation choices (e.g. using a weak pointer of a parent in a callback passed to a child, and maybe throw if parent doesn't exist; right now we just silently proceed if parent doesn't exist).
The special case for notifications is WorkScheduler: for the scheduler, "waking up" means checking its status and potentially posting more stuff to the IO queue, then "notifying" itself. I'm assuming that work scheduler can only be in two possible states when application is running: running or waiting (later aborting should be added too). Scheduler depends on the state of its children, so when we create work scheduler via a static create method, we assign it a custom mNotifyCallback that checks condition and takes an appropriate action as follows:

Scheduler state	Children state	Scheduler action
RUNNING	anyChildrenRunnning()	`crankWork()`, will run more children work
RUNNING	!anyChildrenRunnning()	`crankWork()`, will go into WAITING
WAITING	anyChildrenRunnning()	`wakeUp()`
WAITING	!anyChildrenRunnning()	no-op

Actual callback:

mNotifyCallback = [weak]() {
        auto self = /* ...get shared_ptr for weak... */
        if (self->getState() == BasicWork::WORK_RUNNING)
        {
            if (self->mScheduled)
            {
                return;
            }
            self->mScheduled = true;
            self->mApp.getClock().getIOService().post([weak]() {
                auto innerSelf = /* ...get shared_ptr for weak... */
                innerSelf->mScheduled = false;
                innerSelf->crankWork();
                innerSelf->mNotifyCallback();
            });
        }
        else if (self->getState() == BasicWork::WORK_WAITING &&
                 self->anyChildRunning())
        {
            self->wakeUp();
        }
    };

Since the scheduler depends on the children, WorkScheduler::doWork would look like this:

    State doWork() override 
    {
            if (anyChildRunning())
                    {
                        return BasicWork::WORK_RUNNING;
                    }
                    return BasicWork::WORK_WAITING;
    }

The above aligns with expected action in mNotifyCallback.

Also, it might be worth noting that we use shared_from_this() frequently in the current Work interface with an assumption that there is always an existing shared_ptr (i.e. parent creates and manages child). This makes me wonder: does it ever make sense to create new Work NOT via addWork? I don't think we use anything else than addWork when creating work trees in the code, even in the tests all work is created with application's WorkManager. So maybe it makes sense to restrict work creation to addWork.
Retries: I briefly mentioned retries before, but I think it would work as follows: the next desired state is obtained in crankWork, if it's a retry, we can go into a WAITING state and employ a mechanism similar to the one described for RunCommandWork. Specifically, using a timer, work waits some time, then the expiration of timer triggers wakeUp, which transitions the work into RUNNING state, and propagates the notification, so the work can be retried. We should also probably have some consumer-facing states for clarity, e.g. RETRYING or something.

marta-lokhova commented 6 years ago

Also, few more thoughts to see how abort logic fits into the design. It will be added once the initial design implementation is in place.

Like I already mentioned before, there are really two abort triggers: external and internal. An example of an external trigger is system shutdown, where an abort signal is issued to the work scheduler. An internal abort would be the following scenario: parent work A has some children. Consider one of the children goes into FAILURE_RAISE mode. Next time A is scheduled, it needs to take proper action to ensure the rest of the children are aborted. In order for this to happen, a few things to consider:

as soon as work learns that it needs to abort, it recursively propagates the signal down its work tree. It is important that at that time, parent does NOT go into a terminal state, as it still needs to get re-scheduled to make progress on aborting its work tree.
Work children rely on a signal they got from the parent (could be a bool member variable), and act accordingly to abort - this should be on Work level to avoid having implementers deal with abort mechanism. (Thought they'd need to implement onAbort - I think most of the time it would just be doing nothing; work interface resets any work in terminal failure state anyway, so as long as they provide proper onReset, it should work fine. For classes like RunCommandWork, things would be slightly more complicated since it might need to issue a kill to whatever process the work is waiting on (I also suspect that actually waiting for kill to complete might take quite a bit of time, a timeout/force kill/etc should also probably be added.
This also means that in case of an abort, waiting children must be woken up.
I think we should try to abstract away as much as possible the abort logic from the implementers and concentrate it in the work interface itself. Having implementers deal with it would make it really confusing, so I think all we want to ask from them is an implementation of onAbort.