jakartaee / batch

The Jakarta Batch project produces the Batch Specification and API.
https://projects.eclipse.org/projects/ee4j.batch
Apache License 2.0
13 stars 17 forks source link

Add support for multiple readers, processors, writers, within a chunk step. #107

Open scottkurz opened 4 years ago

mminella commented 4 years ago

What does the spec need to do to accommodate this? You can already accomplish this via composition correct?

rmannibucau commented 4 years ago

Enable lists in the xsd instead of having to code. Batches are assembled from reusable components and not code first app generally.

mminella commented 4 years ago

A few issues here:

First, the spec requires dependency injection of some kind. Because of that, the composition there would be a better place to handle this kind of feature.

That being said, if we look at each element of a chunk based step (reader, processor, and writer), enabling lists on each element is not enough to implement this feature effectively. Specifically:

ItemReader - If you had a list of, say three ItemReader implementations, how do those get aggregated into the single item that the ItemProcessor receives? The reader is the most problematic IMHO for this kind of feature. You need some mechanism to aggregate all the outputs of the readers to be able to pass a single item to the next component. ItemProcessor - This one is the only component that really makes sense to me. An ordered list of ItemProcessor implementations would be reasonable for a user to configure and the underlying framework to connect. ItemWriter - What would a list mean? An item goes to all writers? What about if I wanted to send an item to two out of three? Given the very narrow use case this would enable, I'd vote to have users go the more robust composition route.

rmannibucau commented 4 years ago

First, the spec requires dependency injection of some kind. Because of that, the composition there would be a better place to handle this kind of feature.

Hmm, I fear it does not change anything since each component is configured indepently anyway. Composition is just a way to couple components which can be done injecting a step storage (@StepScoped like) bean. That said, composition makes sense only if it adds value (routing, hardcoded/coupled configuration deduced from a simpler configuration etc) but not if it just chain components - I guess this issue is about it only.

I agree with the fact the hardest is to aggregate the readers but my understanding is that you would consider them as a list and flatmap them for processors, nothing more.

For writer I think the only meaningful impl is to broadcast the processed records to all writers, other impls are too business related and are worth a custom higher order component as you mention but broadcasting does not (side note: broadcasting does not mean concurrent, can be just chained but in terms of pattern it is a broadcast).