spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.69k stars 2.33k forks source link

Refactor JobRepository for greater clarity and consistency. [BATCH-340] #3235

Closed spring-projects-issues closed 16 years ago

spring-projects-issues commented 16 years ago

Lucas Ward opened BATCH-340 and commented

The SimpleJobRepository and it's related Dao's plays a huge part in the architecture, since it completely controls how the domain objects are persisted. Over the last few months, with a lot of added features and minor tweaks, the repository and dao's have gotten a bit out of hand. The Dao's have ballooned in size, and should somehow be split up (even if internally) so that they can be easier to understand. There is also issues with how *Execution domain objects are created, since they were originally created by user, then 'saved'. Things have gotten a bit different lately, with more of a move towards the repository creating even executions, but it's still not consistent.


Affects: 1.0.0.m4

Issue Links:

spring-projects-issues commented 16 years ago

Lucas Ward commented

Robert,

I'll be tied up with some other things this week(and next) and won't be able to tackle this one, so I'm assigning to you. Also, I think I've stared at the same repository so many times over the last year, a fresh look at it might be helpful.

spring-projects-issues commented 16 years ago

Robert Kasanicky commented

A few notes from reading through the repository code (and it's clients):

  1. there's no Job entity => isn't the name 'JobRepository' confusing? (Also JobDao and StepDao)

  2. single repository for accessing 4 different entities => 'GlobalBatchMetadataRepository' is more revealing? Any class that has reference to it can (by design) change just anything, e.g. step executor can create job instances

  3. why should a step executor have a reference to repository at all (even if it was StepExecutionRepository)? Shouldn't it just call stepExecution.save() (it is not supposed to deal with executions that were not passed to it)?

  4. (2 & 3) => we are far from the Principle of Least Privilege

  5. Or methods (findOrCreate, saveOrUpdate) feel dirty. I don't have the DDD book with me, but I believe it argues against those - creation and update are two significantly different events in entity's lifecycle

These observations make me feel like we should significantly rethink how persistence is handled. I don't want to make any non-trivial changes to current implementation without having a clear idea how things are supposed to work.

To get started I would consider having separate repositories for different entities and making sure every class possessing a repository reference desperately needs it (this is where privilege minimalism guides me).

What do you think?

spring-projects-issues commented 16 years ago

Dave Syer commented

I am all in favour of minimal privileges. So we could analyse the JobRepository (I think the name is fine - but BatchMetaDataRepository would also work) and see if it can be broken down. I bet there isn't much mileage in it though, but I'd be keen to see a break down of who uses which methods.

The *Dao interfaces could certainly be refactored in the interests of modularity, and still hidden behind the JobRepository.

My view of Or methods is that they are a necessary evil - however there have been a couple of cases where we re-named one because the create operation became atomic for business reasons (i.e. there is never a save(), only an update()). There may be more of those cases - someone already pointed out that the Job should not insert StepExecutions for steps it is not going to run, so there is potentially some cleaning up to do in the Job implementation(s) as well.

I also don't particularly like to see domain entities (e.g. StepExecution) with repository references either - it takes us too far from the point where they can be sent over the wire.

spring-projects-issues commented 16 years ago

Robert Kasanicky commented

StepInstance holds a reference to its last execution, both in domain model and database schema - is there a strong reason for this?

My impression is it is an unnecessary convenience that pollutes both design and implementation, so I'd like to remove it (at least from the db schema) unless proved valuable.

spring-projects-issues commented 16 years ago

Dave Syer commented

I think it is a convenience, but more than just convenient. The value is that I will often, if not constantly, want to know form looking at a step instance, "when did it last run". I think the de-normalised data model in this case is completely justifiable. if you can write an SQL statement in one line that gives me the same view on the data, I might change my mind.

spring-projects-issues commented 16 years ago

Lucas Ward commented

Dave,

i think that's about the same thing I said during my last conversation to Robert. I'm still a little reluctant as to whether or not a good sql statement could be created. He did, however, make some good comments about how the lastExecution should perhaps not be part of the StepInstance, and should perhaps be something like:

jobRepository.getLastExecution(stepInstance)

It seems like this would remove some of the 'circular dependency' issues with the repository and daos, but would require before mentioned sql statement. Either that, or splitting the dao's by instance and execution won't work. I'm interested to see how it works out.

One other possibility would be to load all executions as a list that could be easily sorted by date. Assuming there isn't 1000 executions, it wouldn't be a problem.

spring-projects-issues commented 16 years ago

Dave Syer commented

I guess I can live with jobRepository.getLastExecution(stepInstance).

spring-projects-issues commented 16 years ago

Robert Kasanicky commented

The first shot on the discussed sql statement would be following (just to prove feasibility of using sql):

SELECT * FROM step_execution WHERE step_instance_id = 1 and start_time = (SELECT max(start_time) FROM step_execution WHERE step_instance_id = 1)

However, I see nothing wrong with filtering the latest execution in java as

  1. large counts are not expected
  2. executions needn't be held in memory all in the same time - it's sufficient to iterate over the result set and keep reference to the so-far-latest execution
  3. repository can cache the latest execution if desired
spring-projects-issues commented 16 years ago

Wayne Lund commented

I argued on Robert's side in our last meeting. I think the dynamic selection of getLastExecution() should be fine. I also argue that if our data for arriving at the last execution is incorrect we have other problems.

  1. there's no Job entity => isn't the name 'JobRepository' confusing? (Also JobDao and StepDao)
2. single repository for accessing 4 different entities => 'GlobalBatchMetadataRepository' is more revealing? Any class that has reference to it can (by design) change just anything, e.g. step executor can create job instances 3. why should a step executor have a reference to repository at all (even if it was StepExecutionRepository)? Shouldn't it just call stepExecution.save() (it is not supposed to deal with executions that were not passed to it)? 4. (2 & 3) => we are far from the Principle of Least Privilege \ 5. **Or** methods (findOrCreate, saveOrUpdate) feel dirty. I don't have the DDD book with me, but I believe it argues against those - creation and update are two significantly different events in entity's lifecycle
spring-projects-issues commented 16 years ago

Lucas Ward commented

Robert,

it looks like you're about done with this issue, and the other remaining tasks appear to be covered by other jira issues. If so, can you resolve this one?

spring-projects-issues commented 16 years ago

Robert Kasanicky commented

Still expecting more changes to be made to the repository, but it is probably better to track them separately.

spring-projects-issues commented 16 years ago

Dave Syer commented

Assume closed as resolved and released