spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.73k stars 2.35k forks source link

Limited to using long for entity ids #1317

Open spring-projects-issues opened 10 years ago

spring-projects-issues commented 10 years ago

Rob Fletcher opened BATCH-2254 and commented

We're attempting to build a workflow orchestration system and Spring Batch – primarily using Tasklets – seemed a good fit but one of the things we need to do is persist job status to a clustered environment (cross-region on Amazon's cloud) in order that we can recover from instance outages, even region outages and have job execution continue. The fact that Spring Batch's Entity class makes it impossible to use any id type other than long is preventing us from using a clustered storage solution such as Cassandra. We'd have to introduce some kind of blocking in order to reliably generate a unique long id without danger of collision. It seems like it would make sense if Entity used Serializable as a key which wouldn't preclude the current strategy for non-clustered SQL stores but would open a path to using UUIDs.


Affects: 3.0.1

1 votes, 4 watchers

spring-projects-issues commented 10 years ago

Rob Fletcher commented

OK. Delving deeper into the codebase I can see that JSR-352 is built around long ids and Spring Batch would be unable to implement various types from that JSR if the id type changed to Serializable. I'm worried this may not be a realistically solvable problem.

spring-projects-issues commented 10 years ago

Dave Syer commented

We should maybe work in this together a bit. I think that the JobRepository was basically designed to need a global lock when creating a new JobExecution, so the ids for that entity should be centrally generated anyway (it's a common problem and I'm sure we can find a solution). The natural key for all the other entities is actually the job execution id and a local identifier, so my instinct is that there should be a repository implementation that doesn't care how global the latter are. As long as you don't need to start loads of really small jobs it should work.

fmbenhassine commented 1 year ago

The impediment due to the JSR is removed as of v5 (#3894). As mentioned by D.Syer, IDs should be generated centrally to prevent creating duplicate job executions. However, they do not have to be of type long. The only current usage of that type is when getting the last job execution (descending order by ID). Getting the last job execution is required in 3 places:

Now since the creation of job executions should be done centrally anyway, I think the ordering could be based on creation time.

--

Related issue: #877