spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.71k stars 2.34k forks source link

Support for mixing per-record itemprocessors with list-based itemprocessors [BATCH-2307] #1296

Closed spring-projects-issues closed 4 years ago

spring-projects-issues commented 10 years ago

Chris Shumaker opened BATCH-2307 and commented

It seems easier to write ItemProcessors that handle one item at a time but sometimes, for performance and scalability reasons, it is preferable to handle a list. Currently, a user must decide between maintenance or scalability, however, this can become a nightmare if the decision changes later. Jobs with per-item processing must be converted start to end (reader, processors, writers) to handle the new paradigm. One specific example is when another framework behaves more efficiently with a chunk/batch/array than it does on a per-record basis. This is the case with most rules engines.

An almost complete suggestion might be an aggregating processor and a splitting processor to aggregate items to a list and split them to individual items. There are issues with this like "what happens to the unprocessed aggregates when the reader returns null?". Perhaps if there was some insight to the reader's return this might be possible.

Another alternative would be a configurable change to any given processor which determines how the core read, process, write pipeline works. For example, a configurable value that says chunked="true" on an itemprocessor would designate to spring batch that it should aggregate items prior to calling process for any processor marked as such. That might eliminate synchronization issues between the list-based process and non-list-based reader.


Affects: 3.0.1

spring-projects-issues commented 5 years ago

Mahmoud Ben Hassine commented

The "Item" concept is abstract. Nothing prevents you from having a logical item as an aggregate of multiples physical items. This is the case for example for flat files where the target domain object spans multiple physical lines. The ItemReader<T> and ItemProcessor<I,O> are generic, so depending on how you define your "item" and how the reader provides it, the pipeline can operate on one item at a time or a list/set of items (encapsulated in a logical aggregate item).

a configurable value that says chunked="true" on an itemprocessor would designate to spring batch that it should aggregate items prior to calling process for any processor marked as such

The aggregation should happen on the reading side (as you said: "prior to calling process") so that aggregated items are sent to the processor. We provide an example with the AggregateItemReader here.

So for me, the requested feature "Support for mixing per-record itemprocessors with list-based itemprocessors" is already possible with the current chunk processing model and the generic interfaces that Spring Batch provides. It is just a matter of how to design an item. @ Chris Shumaker Do you agree?

As a side note since you talked about performance, I would like to emphasize two important points:

spring-projects-issues commented 4 years ago

Mahmoud Ben Hassine commented

Resolved for the reasons explained in detail in my previous comment.