jakartaee / batch

The Jakarta Batch project produces the Batch Specification and API.
https://projects.eclipse.org/projects/ee4j.batch
Apache License 2.0
13 stars 17 forks source link

improve working on 'skippable' and 'retryable' exceptions #66

Open follis opened 4 years ago

follis commented 4 years ago

Originally opened as bug 6511 by BrentDouglas

--------------Original Comment History---------------------------- Comment from = BrentDouglas on 2014-11-10 17:38:07 +0000


Comment from = BrentDouglas on 2014-11-10 17:41:58 +0000

See discussion at https://java.net/projects/jbatch/lists/public/archive/2014-11/message/4

Note that the link above is bad, but we had it archived..pasting it here.

From Brent Douglas 7 Nov 2014

Hi,

I have been reading the spec a bit today and I have some questions relating to section 8.2.1.4.

* Skipping/Retrying an exception

I think the terminology used in this section could be improved. There are a lot of phases that could be better worded to make their intent more clear e.g.

These changes would make it more clear that the entire chunk is being retried. As for 'skipping an exception' , I can't find a clear interpretation in the spec of what this means but my understanding is that it is something like 'ignore that the method invocation threw and continue to the next think on chunk table 11.8'. No matter if this is correct or not I think the spec would be easier to understand with an update to better explain the scope of what is being skipped.

While I was a little unclear on 'skipping', I found the explanation for 'retrying' even less clear. The explanation of what happens when a retryable-exception-class matches an exception is summed up in this one sentence:

*When a retryable exception occurs, the default behavior is for the batch runtime to rollback the current *chunk and re-process it with an item-count of 1 and a checkpoint policy of item.

The first time I read this I saw it as, abandon the running chunk, run a chunk with an item count 1 and if that works proceed as before the rollback with the configured checkpoint policy. After looking at the RI, BatchEE and JBeret, I saw that this is not a common interpretation. From what I can see (I'm going to ignore BatchEE as it's implementation of retry looks unimplemented/broken), both these implementations process the number of item already read from the failed chunk. I don't really like this interpretation, assuming we are 'reprocessing the current chunk', why is the unprocessed portion excluded? The interpretation I have decided I like the most is that the entire chunk where the failure occurred should be reprocessed. This has the advantage of keeping the chunks aligned (as in if you are running item-size=4 with a reader that produces 16 things and failed in the second chunk you wont end up with chunk sizes 4,(1,1,1),4 4,1 but 4,(1,1,1,1),4,4). I would like to find out why these two only process a portion which IMO is not in the description in the spec.

* Metrics when skipping from #checkpointInfo

As exceptions thrown from the #checkpointInfo methods of ItemReader and ItemWriter are able to be skipped/etc (8.2.1.4.1):

It also applies to exceptions thrown during checkpoint commit processing. A failed commit will be treated the same as a failed write.

I would like to know if the relevant READ_SKIP_COUNT or WRITE_SKIP_COUNT metric should be incremented if the corresponding #checkpointInfo method throws a skippable exception. (btw, the RI doesn't actually try skipping/retrying these). My understanding of 10.2 is that they should be e.g.

6. readSkipCount - the number of skippable exceptions thrown by the ItemReader.

But they should not be passed to a listener (from the javadoc), e.g.:

even though 8.2.1.4.1 says:

A Skip Listener receives controlafter a skippable exception is thrown by the reader, processor, or writer.

Is this interpretation correct?

Brent