spring-projects / spring-batch

Spring Batch is a framework for writing batch applications using Java and Spring
http://projects.spring.io/spring-batch/
Apache License 2.0
2.65k stars 2.32k forks source link

Core and Infrastructure still have circular dependency [BATCH-501] #3064

Closed spring-projects-issues closed 16 years ago

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky opened BATCH-501 and commented

Several classes in Core package org.springframework.batch.core.step.item depend on infrastructure classes ItemReader and ItemWriter. However, ItemReader and ItemWriter are not core stereotypes. Conversely, infrastructure inherently depends on core.

Proposed solutions:

1) Move classes in base org.springframework.batch.item package and potentially the generic subpackages (e.g. adapter, support, transform) to the same package name but in the Core module, leaving the optional concerns in infrastructure (e.g. JMS, stax, ORM, etc.)

2) Move those same classes into the org.springframework.batch.core.step.item package

3) Completely combine core and infrastructure - honestly, you never use one without the other, why are they still separate? If you want to separate out optional concerns, do that (e.g. JMS, stax, ORM, etc.) but there's honestly no real reason to ever not have at least half the infrastructure package on your classpath (unless you think item oriented processing is evil and you only use tasklet steps, but that's a fringe concern).

I vote for (3)

Please re-examine this pre-release.


Affects: 1.0.0.rc1

spring-projects-issues commented 16 years ago

Robert Kasanicky commented

Doug,

infrastructure is compiled separately from core, so there can't really be a circular dependency. I agree you currently always use them together, but (at least in theory) you can reuse the infrastructure to implement a batch framework that works differently (alternative core). It doesn't hurt to have separate jars even if they are 99.9% time used together

So I don't think we should be merging modules, they are already a little too coarse-grained in my opinion. E.g. current infrastructure could be split into (io = readers, writers and related) io-api, io-standard-library, repeat, retry. Core certainly wouldn't depend on io-standard-library, only on io-api. To implement a custom ItemReader you would need just io-api etc. But after the package structure cleanup it should be possible to do the splitting when needed.

spring-projects-issues commented 16 years ago

nebhale commented

I'll follow up on Robert's comments that there are no actual compile-time dependencies from infrastructure onto core. I actually spent a bit of time at the end of last week working on the OSGi manifests for batch and you can see in spring-batch-infrastructure/src/main/resources/META-INF/MANIFEST.MF that there are no dependencies back.

I think that you definitely have a point about them logically being massively inter-related to one another, but this close to 1.0.0 I don't think that we're going to be able to do anything about it. For 2.0.0 I'll be looking much more closely into how tightly bound the infrastructure is to the core Spring Batch abstractions and seeing how the JARs can be better separated.

spring-projects-issues commented 16 years ago

Dave Syer commented

We already discussed this quite a lot and decided to keep infrastructure separate. There are (of course) no compile time cycles, and we do actually have a case for using infrastructure on it's own (in fact the reason it exists is a client requirement).

I am a little uncomfortable with the way that sometimes the design of infrastructure has been driven by core requirements, but I console myself with the argument that core is actually a pretty good example of a real-life client of infrastructure, so it isn't completely unnatural.

So I vote for none of the above.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

Of course having compiled the code myself on several occasions I am aware that there is no compile time circular dependency problem. It's more of a runtime dependency issue. My personal ideal is that package separation should be such that I can use the minimum number of classpath entries possible to accomplish my work.

I understand that there's at least one client using infrastructure separately, but that really just speaks to the usefulness of the individual components being used.

For instance, I know that the repeat and retry templates are being used independent of Spring Batch. This doesn't justify our current package separation, rather it only suggests that they might be useful enough to move over to Spring proper (or at least into a spring-batch-commons area) so that people who want to use it don't have to also import tons of other stuff and all of the jar dependencies of infrastructure. I would definitely support adding them to Spring proper, perhaps spring-aop since they are useful as advice (i.e. with the Repeat/RetryOperationsInterceptor).

It definitely makes sense to separate out specific item-oriented implementations, since they introduce new dependencies to end-user projects. For instance, to use infrastructure, I need to have xstream, hsqldb, stax, spring-oxm, spring-ws, hibernate, ibatis, etc etc etc even if I don't intend to use the classes that utilize these. However, the basic "infrastructure" components (i.e. the classes I suggest moving in proposed solutions (1) and (2)) can be used without adding extra external dependencies. They are also tightly coupled with Core bidirectionally, as opposed to the specific implementations which depend on Core, but which Core does not depend on.

The fact remains, for 99.9% of use cases I cannot use Core without Infrastructure, and this can be easily fixed. If (3) is too much work, (1) and (2) consist purely of a trivial refactoring and solve the problem quite nicely. All we need to do is admit that the basic Item* collaborators are really core concepts, which we already acknowledge by including the ItemOrientedStep in Core -- of course, moving the ItemOrientedStep to infrastructure would also be a valid solution but I'm not particularly fond of that.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

And the reason I ask that this be handled before 1.0.0 is the possibility of package renaming. I know there are a lot of pressing reasons to push ahead with 1.0.0 and fix things later, but remember that back-compatibility is going to bite you in the butt. Changing package names, module contents, the locations of various classes in the package hierarchy and public APIs will be all but impossible until 2.0.0, and although changing major version numbers is not a big deal from our perspective, I really think it sends a bad message about how the project is being managed if it happens too soon after the previous major version release, and this is no good for either of the entities involved.

spring-projects-issues commented 16 years ago

Dave Syer commented

I still don't think there's enough incentive to do any more re-packaging. it's not the amount of work required, and I don't feel that this decision has been hurried - it really has been on the radar for quite some time now. Just because core depends on infrastructure it doesn't mean they should be merged. I don't see where this bi-directional dependency is that you are talking about.

All those dependencies are optional, by the way, and therefore should not be necessary to include on your classpath - it's true that we don't test this, so we might need to do some work to make sure it is true. We can work on reducing the number of mandatory dependencies later without affecting anyone's existing usage.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

How do you not see that ItemOrientedStep relies on the following infrastructure components and won't compile without them?

RepeatOperations RepeatTemplate ItemStream CompositeItemStream ItemReader ItemWriter

spring-projects-issues commented 16 years ago

Dave Syer commented

The point is that the dependency is one-way - those things don't in turn depend on anything in the core module.

The optional dependencies are the external ones you mentioned (hibernate, ibatis etc.).

spring-projects-issues commented 16 years ago

Lucas Ward commented

I have to agree with Dave on this one. It really doesn't bother me that Core depends upon Infrastructure. The question is whether or not the classes in Infrastructure are useable/useful outside of the core. Like you said, Retry can be useful, so we could put it in Spring proper, or you even said Spring Batch Commons, what in infrastructure wouldn't belong there? I can see a use for some of our file ItemReaders on their own as well too. What would be the difference between Spring Batch Commons and Spring Batch Infrastructure? Very little. We can't really put them in spring batch proper, because they're inherently batch focused. The packages may need to be tweaked, but I think that should be considered outside of the modules they're contained in. Maybe it will turn out that no one uses any infrastructure components by themselves, but I think that's something we'll need to wait and see about after the release.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

what in infrastructure wouldn't belong there?

Sort of my justification for leaning towards the option I labelled (3) above. In any case, an item reader without an enclosing control flow (i.e. item handlers, item stream callbacks, etc.) isn't that useful.

I'd say if we were going to conceptually differentiate between Core, Commons and Infrastructure, I would use the following descriptions: [LIST] []Core - package containing the bare minimum components you need to successfully launch a batch job within the framework (by this definition, the above-mentioned classes should be in core) -- otherwise, it's not the "core" of your product. []Commons - classes that are not batch domain objects or implementations thereof - i.e. they are useful outside of the context of batch - these are classes that could potentially be moved to Spring proper without compromising its mission in any way. This is not a strictly necessary distinction, but it would address the challenge to my logic about using the Repeat/RetryTemplate classes outside of batch. [*]Infrastructure - (not that this name particularly suits my definition) contains concrete implementations of core concepts that apply to end users' use cases, e.g. core contains "what is an item reader", infrastructure contains "a flat file item reader", "an xml item reader", etc. [/LIST]

--> Side note: now that I'm thinking about it, the argument that the core/infrastructure distinction is made to aid those who need some class or classes outside of batch is flawed. If these same classes were moved to core, these clients could just import core instead of infrastructure to the same end -- this would also be a much more lightweight jar for a non-batch project to use. Of course, a "commons" jar or moving such classes to Spring proper would be even lighter-weight

Anyway, thus my feeling that at the bare minimum, the above-mentioned classes should be moved to a package under core.

spring-projects-issues commented 16 years ago

Douglas C. Kaminsky commented

Ack! Apologies, I was in "forum" mindset - please forgive my markup faux pas

spring-projects-issues commented 16 years ago

Lucas Ward commented

I still don't see any major advantage to this. In most cases people would still have to import at least two jars, even with the approach you're advocating, and I just don't see this as being major enough to try and shovel in before release one. I'm closing as won't fix. We can revaluate the modules at 1.1, and packaging at 2.0.