jakartaee / batch

The Jakarta Batch project produces the Batch Specification and API.
https://projects.eclipse.org/projects/ee4j.batch
Apache License 2.0
13 stars 17 forks source link

Step failed with <fail> element not restartable #74

Open follis opened 4 years ago

follis commented 4 years ago

Originally opened as bug 6825 by cf126330

--------------Original Comment History---------------------------- Comment from = cf126330 on 2015-03-19 16:51:04 +0000

This is an issue raised by Takashi, who described it very well in WildFly issue WFLY-4427. I'd like to discuss it here and see if the spec can be improved to better handle this case.

According to the current spec, element can appear inside a , and when is triggered, the job will fail and this step's batch status is COMPLETED. During restart, that step will not re-execute as it had batch status COMPLETED, unless allow-start-if-complete is explicitly set to true. But oftentimes, users tend to omit allow-start-if-complete attribute, thinking the default should work fine.

So we now have a failed job execution that cannot be restarted. TCK contains some tests to enforce that the step containing element has COMPLETED batch status, and the step will not re-execute during subsequent restart.

So the main question is, do we want to change the spec to allow such step to re-execute during restart, even without specifying any allow-start-if-complete?

See spec group discussion thread: https://java.net/projects/jbatch/lists/public/archive/2015-03/message/23


Comment from = ScottKurz on 2016-02-10 17:01:35 +0000

Picking this thread up again,

I think one of the proposal in the ML thread linked was to add a @restart to for something like:

(and I think similarly for though I didn't follow that).


I think the downside of this is what if you really need step 1 to run every time... that's part of the rationale of 'allow-restart-if-complete' to begin with. So we've lost that, and I hate to make someone restart on a decider to sort through all this.

Also not sure the other proposals addressed the use case I myself mentioned, where we throw an exc in step 2 thereby ending in FAILED, yet transition past it to step 3 with a element, and wish to restart on step 3 WITHOUT re-executing step 2 (i.e. pick up where we left off).

To deal with both cases I can see conceptualizing something like an "execute on restart" policy associated with a given step.

So in 1.0, we could say we have two policies which I'll call:

  1. IF NOT COMPLETED (this is the default)
  2. ALWAYS (this is allow-restart-if-complete="true")

(Of course, we also only execute if we transition to this step upon restart to decide whether to rerun, but I'll take that for granted.)

So I think we could solve both use cases by adding to other "policy" choices:

  1. NEXT_TIME
  2. NEVER

E.g. for Takashi's original use case, we could have

Once this step is re-executed, the "flag is removed" you could say, and we revert to policies 1. or 2. The allow-restart-if-complete="true" setting would take precedence. You could set this on and probably too I'd think.

Then for my use case you could have:

There is no way to undo never... no way we will ever re-execute this step.

I'm not sure the need for this has been convincingly made, but again, it seemed worth considering both cases together.

It is maybe a bit odd to have the one policy choice triggered by a attribute ( allow-restart-if-complete) and these other choices by transition elems. But there is a precedence here and they're not symmetric.

...

Another idea I'd raised is conceptualizing this as "transition within" vs. "transition after". The down side of this is that in my use case, the step has already FAILED. Marking it as COMPLETED seems like erasing history. Perhaps the analogous approach in Takashi's case wouldn't be so bad (even though the full step logic including afterStep() saw a normally-completing step maybe we could still get our heads around the step ending in a FAILED state because of the transition failure). But it would bother me to not see the two cases behave somewhat symmetrically.