Closed spring-projects-issues closed 4 years ago
Mahmoud Ben Hassine commented
The "Item" concept is abstract. Nothing prevents you from having a logical item as an aggregate of multiples physical items. This is the case for example for flat files where the target domain object spans multiple physical lines. The ItemReader<T>
and ItemProcessor<I,O>
are generic, so depending on how you define your "item" and how the reader provides it, the pipeline can operate on one item at a time or a list/set of items (encapsulated in a logical aggregate item).
a configurable value that says chunked="true" on an itemprocessor would designate to spring batch that it should aggregate items prior to calling process for any processor marked as such
The aggregation should happen on the reading side (as you said: "prior to calling process") so that aggregated items are sent to the processor. We provide an example with the AggregateItemReader
here.
So for me, the requested feature "Support for mixing per-record itemprocessors with list-based itemprocessors" is already possible with the current chunk processing model and the generic interfaces that Spring Batch provides. It is just a matter of how to design an item. @ Chris Shumaker Do you agree?
As a side note since you talked about performance, I would like to emphasize two important points:
Mahmoud Ben Hassine commented
Resolved for the reasons explained in detail in my previous comment.
Chris Shumaker opened BATCH-2307 and commented
It seems easier to write ItemProcessors that handle one item at a time but sometimes, for performance and scalability reasons, it is preferable to handle a list. Currently, a user must decide between maintenance or scalability, however, this can become a nightmare if the decision changes later. Jobs with per-item processing must be converted start to end (reader, processors, writers) to handle the new paradigm. One specific example is when another framework behaves more efficiently with a chunk/batch/array than it does on a per-record basis. This is the case with most rules engines.
An almost complete suggestion might be an aggregating processor and a splitting processor to aggregate items to a list and split them to individual items. There are issues with this like "what happens to the unprocessed aggregates when the reader returns null?". Perhaps if there was some insight to the reader's return this might be possible.
Another alternative would be a configurable change to any given processor which determines how the core read, process, write pipeline works. For example, a configurable value that says chunked="true" on an itemprocessor would designate to spring batch that it should aggregate items prior to calling process for any processor marked as such. That might eliminate synchronization issues between the list-based process and non-list-based reader.
Affects: 3.0.1