spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.7k stars 2.33k forks source link

Erlang-/Scala-Actor-Style Step or Reader/Writer/Processor [BATCH-1227] #2350

Open spring-projects-issues opened 15 years ago

spring-projects-issues commented 15 years ago

Jörg Gottschling opened BATCH-1227 and commented

I recently (had to) thought about scalability and performance in one of our largest batch jobs. One problem there is, that our deployment is on system B, this is also were all the business logic and some additional data exists, we read from a database from system A, process them using the business logic on B and write to a database on system C. The bottleneck is the network. (No, we can not deploy on A or C. :-( )

Up to now it's made with our homegrown Spring based framework, similar to Spring Batch. But will migrate now to Spring Batch for various reasons.

I thought about implementing special delegate Reader, which is working in it's own thread, some what similar to what I have understand how actors work in Erlang or Scala work. My idea is, that the Reader writes into a queue, from which the client (the Step!?) reads. While the Step processes the data, the reader can go own reading, unless the queue reaches a configured limit. After processing the chunk will be handed over to the writer. One could imagine, also to implement it, using a queue. But it think this is not perhaps not needed here and I do not know how it affects transaction handling, etc.

I think to implement a Writer like that would be very hard. Implementing a Processor that way may be impossible, because it has to return the item. So perhaps this is better located at Step level?

What do you think about this? And is this what you refer to as SEDA-Style architecture?


Affects: 2.0.0

spring-projects-issues commented 15 years ago

Jörg Gottschling commented

Hm, ... i this where one should use remote chunking?

spring-projects-issues commented 15 years ago

Dave Syer commented

Yes, that's pretty much what you are describing. Remote chunking requires the chunk messages to be sent in a durable channel with guaranteed delivery to a single consumer. It is implemented in the Spring Batch Integration project (in SVN but not part of the 2.0 release). Partitioning might also be helpful if you don't want to use a durable queue (JMS etc.). Or if you don't care about transactions (if the work on your items is non-transactional anyway, like a web service call) then you can use this pattern without the durable queue. I haven't played with that much, but it seems like it should just work, and I know a couple of projects where people said they were going to try it.

spring-projects-issues commented 15 years ago

Jörg Gottschling commented

It found this one today: http://forum.springsource.org/showthread.php?t=64451
Sound like what I proposed an easier then remote chunking. I just would like to implement it as a delegate.

spring-projects-issues commented 15 years ago

Dave Syer commented

Some sort of delegate pattern with a Future might work with less fuss (e.g. see https://fisheye.springsource.org/changelog/spring-batch/?cs=3343). I'm not convinced of the performance benefit though, and there was never any follow up with actual numbers on that. If you measure something in a concrete use case, can you post the results?