Expand decision capability - allow transition from a decision to another decision, beginning job with decision

scottkurz commented 4 years ago

The spec language contradicts itself and confuses around whether a decision can follow a decision

From spec/src/main/asciidoc/job_specification_language.adoc

... The decision element may follow a step, flow, or split. A job may contain any number of decision elements. A decision element is the target of the "next" attribute from a job-level step, flow, split, or another decision.

Something like:

<decision id="decider1" ref="myDecider">
        <next on="EXECUTE" to="step1"/>
        <next on="SKIP" to="decider2"/>
    </decision>

seems useful, and I don't see why we'd disallow it.

We need to decide if the Decider in this case would be passed:

the StepExecution[] passed to the previous decision's Decider OR
- an empty array

Reference: https://github.com/OpenLiberty/open-liberty/issues/10102, where this was opened against Open Liberty.

rmannibucau commented 4 years ago

@scottkurz would all decisions be reevaluated or only the ones after current one - giving an impacting order in the execution and not just the graph? Should it be "loop" protected (max 10 iterations for ex)?

scottkurz commented 4 years ago

@scottkurz would all decisions be reevaluated or only the ones after current one - giving an impacting order in the execution and not just the graph? Should it be "loop" protected (max 10 iterations for ex)?

@rmannibucau "loops" are already prohibited (see Loop definition ) even across executions.

I'm not sure I know what you mean by decisions being "reevaluated" if that doesn't cover your concern though.

rmannibucau commented 4 years ago

@scottkurz loops are not really prohibited since this case is not allowed (or you can see this case as forbidden). So if we enable this case, how do we handle loops. Point being that when you enable a decision you can desire to go back to reevaluate previous decisions since it is a state change and being on the same node you will likely hope all decisions are reevaluated. Therefore loop must be enables until some point if we enable this feature.

scottkurz commented 4 years ago

@rmannibucau - It sounds like you're proposing that, together with the change I proposed, we also revisit the loop prohibition. Well, I wasn't proposing we revisit loop prohibition... I was viewing the transition as "forward-looking", with an order inherent in the job... the decision lets you skip steps but not introduce loops.

So I don't see why we'd need to necessarily revisit that. It seems like a separate issue.

rmannibucau commented 4 years ago

@scottkurz think it is a consequence. Your proposal assumes you know which state you can get in which order which is rarely the case because you generally try to have a deterministic batch so if you want to reevaluate a decision you should reevaluate them all or none - can be why it was not originally allowed. I think - correct me if not - you are thinking to error recovery case but if you have FAILED, ERROR and SUCCESS handling and you get ERROR after the first decision and it still fails shouldn't you use this decision branch? To rephrase it: I think it opens some new cases but it creates at least as much problems IMO + it can break some existing jobs which had a well defined behavior and will get a new behavior now. So I wonder if it wouldn't need a state addition like:

<next on="EXECUTE" to="step1" max-reevaluation="2"/>

(name is very bad but it shares the idea). Indeed default would be 0 (as of today).

scottkurz commented 4 years ago

I'm thinking of a very linear job, with the decision dynamically determining whether to execute certain steps or skip them, e.g.:

    <decision id="decision1" ref="myDecider">
         <next on="EXECUTE" to="step1"/>
         <next on="SKIP" to="decider2"/>
    </decision>
    <step1 next="decision2"/>
    <decision id="decision2" ref="myDecider">
         <next on="EXECUTE" to="step2"/>
         <end on="SKIP" to="decider2"/>
    </decision>
    <step2/>

So after a decider returns SKIP in "decision1", we don't know next what step to actually execute .. we have to run the logic in "decision2".

I can see the value in that.. and there's no way it can break existing jobs because of looping, since loops are already prohibited .

But @rmannibucau, maybe you could elaborate on what you think also needs to be added?

rmannibucau commented 4 years ago

@scottkurz it wouldn't break existing job due to loops but in the sense the behavior is different so all alerts and monitoring can be broken - code itself and execution itself is acceptable but state changes so it breaks. This is where I think a toggle can help - we can discuss if we should add a fallback toggle (ie change default and enable to get the old behavior - was not allowed in javaEE land but maybe JakartaEE enables it?) or the opposite IMHO but I don't think we should do it without a fallback until we rework the DSL completely (we have that opportunity with Java DSL which hopefully does not map 1-1 the XML and brings more java into the game).

In your example, if decision1#myDecider returns "SKIP" then whatever returns decider2 it will not follow decisions since we are already at the end until you reevaluate all transitions of the decision1 and in such a case a loop can happen. This is why I think you can't add this feature without enabling some loop controls (proposed #iterations but any equivalent solution/protection works).

Also semantically, we move from a job node id to a lambda (function or whatever name you want ;)) which makes the XSD harder to follow IMO.

Now with your example, if I got it, I'm not sure what we gain, looks like both decision blocks would be easily mergeable and it would benefit the job readability.

I don't think the feature is bad - it is actually quite good when you have several output states (personally I consider N > 5) - but I think the DSL becomes too complex and a bit more fragile for what it serves.

It also does not help the use case about reusable decision blocks which acts as a kind of template:

(invalid but to share the idea):

<decision-template id="errorHandler" ref="myDecider">
     <next on="ERROR" to="endStep"/>
     <next on="SUCCESS" to="#{nextStep}"/>
</decision-template>

<step1 next="decision1"/>
<decision id="decision1" parent="errorHandler">
     <property name="nextStep" value="step2" />
</decision>
<step2/>

So if we step back, decisions limitations today is mainly that we can't reuse a decision block. If we enable that it seems that transitioning from a state to a decision is less needed and it enables to reduce the verbosity of the DSL, no?

Now the open question: in Java form we don't recally care since we would have loops, streams and all fancy programming features so is it a feature conditionned by Java DSL (I assume XML DSL will die with Java DSL creation).

scottkurz commented 2 years ago

I'm going to tack on #45 (beginning a job w/ decision) to this issue, to resolve together, and close that one.

jakartaee / batch

Expand decision capability - allow transition from a decision to another decision, beginning job with decision #111