quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.39k stars 2.56k forks source link

Offer a operation batch framework #1505

Closed emmanuelbernard closed 3 years ago

emmanuelbernard commented 5 years ago

Batching operation and have the facility to restart on error. This is jbatch territory. But let’s make sure we Panache it when we come to it.

chengfang commented 5 years ago

Good to know it's in the plan. Project JBeret (the upstream project implementing JSR 352) is located at https://github.com/jberet/jsr352 . Integration into WildFly is at https://github.com/wildfly/wildfly/tree/master/batch-jberet . Feel free to reach out to JBeret team if you need anything.

emmanuelbernard commented 4 years ago

@paulrobinson why did you added it to in progress? I am not aware of anything besides some conversations in Zulip recently

paulrobinson commented 4 years ago

@emmanuelbernard looks like I moved it in error. Thanks for moving it back.

mswiderski commented 4 years ago

has anyone considered to use kogito (workflow) as an option to cover this?

jmartisk commented 4 years ago

Hey guys, I recently started experimenting with this a little bit, trying to integrate JBeret 1.3.x into Quarkus. I'd like to start a discussion to decide whether JBeret is indeed the way to go here, or I am walking into a dead end, before I invest any significant amount of time in this. I used to work with JBeret a lot back in my QE days, so I hope I still remember what it's about :)

First problem I found was that if we want to parse job definitions in STATIC_INIT and pass them to a recorder, the model objects need to conform to rules to allow this (no-arg constructors present, getters/setters present, getter must not throw an exception,..). I prepared some changes on JBeret side that fix this: https://github.com/jmartisk/jsr352/commit/93b4c0d0d66cc50cb611ccc0d9deb14e9c6ac1d7 (this is probably not complete yet, more classes will need this, but just to give an idea) - @chengfang what do you think about these changes? Would they seem acceptable into JBeret? Other option would be that Quarkus has its own model objects, but that sounds like reinventing the wheel). After I applied those changes, I managed to get a crude prototype working with parsing jobs at build time.

Also what are the plans for JBeret 1.4.0 at the moment? Should I try and base the work on master instead of 1.3.5, which is the latest release?

chengfang commented 4 years ago

@jmartisk I'm open to making such changes in JBeret to make it work with Quarkus. JBeret 1.3.x branch is a maintenance branch, so I would prefer not to apply these new changes to that branch. JBeret master branch (JBeret 1.4) is currently a place holder for any small enhancement or fixes that are not applicalbe to 1.3.x branch, and should be okay try out your changes.

jmartisk commented 4 years ago

Cool, thanks @chengfang. So unless someone comes up with the idea that JBeret is not the way to go, I'll try and continue my experiments, basing it on JBeret master and assume that we will get 1.4.0 out with the Quarkus compatibility fixes. I noticed one more significant problem and that's the usage of JBoss Marshalling, that's not compatible with native compilation, fortunately it isn't used that extensively. I'll file an issue in JBeret to find a solution.

fredgcosta commented 4 years ago

@jmartisk I am not a Jberet expert, but I am willing to help! I have more experience on Spring Batch, I recently implemented some projects using it. I will try to port them to jberet, to understand more about the differences and see if it meets all my requirements. For me, it would be a great feature to have on Quarkus.

gsmet commented 4 years ago

/cc @emmanuelbernard to check if this is a priority

radcortez commented 4 years ago

hey @jmartisk, did you work more on this? I'll be happy to help.

jmartisk commented 4 years ago

@radcortez I have a crude prototype at https://github.com/jmartisk/quarkus/tree/master-jberet but didn't get much further, specifically it doesn't work in native mode because JBeret uses JBoss Marshalling internally, we need to get rid of that. I dropped working on it because it didn't seem to have enough priority, but I am willing to get back to it if we want it.

emmanuelbernard commented 4 years ago

Hey @jmartisk. I had conversations with @maxandersen who convinced me that this would be useful. So I don’t know if that’s top priority but it’s not bottom anymore from my point of view :) The idea is that it unlocks app possibilities for first Quarkus users. So even if that might not be perfectly cloud native aligned, but this is a good addition.

radcortez commented 4 years ago

I like to work with Batch, so I'm also happy to help. For the moment, I've just tried to migrate one of my old Batch projects to use Quarkus and I was able to run it with JBeret with some bridge code: https://github.com/radcortez/wow-auctions/tree/master/quarkus-batch/src/main/java/com/radcortez/quarkus/jberet

I see that you already started the extension, which would be my next step. I'll have a look into that.

radcortez commented 4 years ago

@chengfang JBoss Marshalling is used to clone a few objects around JBeret. Is it acceptable to remove it and clone the objects with Cloneable and provide a straight implementation to each? Do you have any other suggestion?

chengfang commented 4 years ago

JBoss Marshalling is used for cloning objects in JBeret, mainly because JBM is already widely used by WildFly. So JBeret can leverage it without adding dependencies. If the same functionality can be achieved in other means, perferrably without new dependencies, it should not be a problem.

radcortez commented 4 years ago

@chengfang Thanks! I'll give it a try with one or two cases with a manual implementation to see how it looks like.

radcortez commented 3 years ago

I was having another look into this around the JBeret model classes and what do we need to pass to the Quarkus recorders.

The JBeret model classes, like @jmartisk stated, are not really prepared to be used in the Quarkus Recorder. On the other hand, making the necessary changes, seems to undermine the original design of such classes.

Then I've found that JBeret provides Builders for such classes. These could probably be changed as much as we want, since they are just builders, and we could read the Job definitions directly into the builders, pass them to the Recorder and build them and runtime.

@chengfang what do you think? Providing getters / setters and a no args constructors to the builders, should be more straightforward and will less implications.

jmartisk commented 3 years ago

@radcortez oh, sorry it seems I forgot to tell you that I have a JBeret branch where I have already started preparing JBeret model classes for this - https://github.com/jmartisk/jsr352/commits/quarkus

radcortez commented 3 years ago

@jmartisk no worries, I've seen that :) You mention the commit with the changes in one of your previous comments.

My discussion here, is about the changes itself, for instance: https://github.com/jmartisk/jsr352/commit/93b4c0d0d66cc50cb611ccc0d9deb14e9c6ac1d7#r40585600

Should we really make that change just because of Quarkus?

On the other hand, there are builder classes for Job elements, not sure if you notice them: https://github.com/jberet/jsr352/blob/master/jberet-core/src/main/java/org/jberet/job/model/JobBuilder.java

So maybe we could use these instead and any API change to the builder classes will be less meaningful compared to changes required to the runtime model.

jmartisk commented 3 years ago

@radcortez Not sure about how exactly you envision using JobBuilder instead. That's a public API meant for programmatic building of job definitions. JBeret internally reads job definitions from XML files directly into a Job instance. And the job builder itself depends on the same runtime model classes that the resulting Job will reference anyway.

So I think that if we don't want to change the current classes, we'd need a whole new set of recordable classes that can be quickly transformed into a Job at runtime.

jmartisk commented 3 years ago

Back to the JBoss Marshalling topic, @chengfang please correct me if my understanding is wrong - AFAIU the main reason for introducing it was that job model classes are mutable. So for example, if you have a Job definition and the user runs a new job with specific parameters, the original Job is cloned and mutated to contain the parameters (they are part of the Job object), and then this new object is used to execute the job. If we changed the architecture so that we don't have mutable objects, we wouldn't need marshalling at all. For example, the Job model class could be just an immutable abstract job definition that is transformed into a concrete definition (containing parameters etc) when the job is about to start. That concrete job object could potentially stay mutable, but the abstract definition would need to be immutable.

radcortez commented 3 years ago

@radcortez Not sure about how exactly you envision using JobBuilder instead. That's a public API meant for programmatic building of job definitions. JBeret internally reads job definitions from XML files directly into a Job instance. And the job builder itself depends on the same runtime model classes that the resulting Job will reference anyway.

A possible solution would be to transform the Job into a JobBuilder again. Yes, the JobBuilder has all the other elements that would need to be transformed to their corresponding builder objects.

So I think that if we don't want to change the current classes, we'd need a whole new set of recordable classes that can be quickly transformed into a Job at runtime.

Maybe it's fine to change them. Let's see what @chengfang things about that. The changes don't seem to break JBeret tests, but my concern is to change the contracts, and developers might expect to use these new API to develop their batch, but most like wont work. For instance we add a a no args constructor to Job, but a Job always requires an id, so an end user shouldn't be able to call it.

Back to the JBoss Marshalling topic, @chengfang please correct me if my understanding is wrong - AFAIU the main reason for introducing it was that job model classes are mutable.

One other reason that I've found to use cloning is to support JSL inheritance. For instance here: https://jberet.gitbooks.io/jberet-user-guide/content/batch_jsl_inheritance_and_composition/

Cloning is used to copy the parent element definition to the actual concrete element, and then enhanced with the additional properties.

chengfang commented 3 years ago

In JBeret, it's used to clone certain objects. One is as Jan pointed out, to clone a job, which may contain property expressions that resolve to different values in each job executio. The other is to clone job execution and step execution objects in concurrent partitioned executions, as the spec requires each partition to have its own copy of job execution and step execution. And also as Roberto mentioned, it's also used in when a child job inheriting from parent job definition.

It is the most straightforwad way to achive the spec requirement and keep isolation when we implemented it. If there is another cloning library that is Quarkus friendly, we should be able to swap it easily. As jboss-marshalling is used in other products, how do these products handles it when supporting Quarkus? Otherwise, we can implementing cloning in all affected classes and make sure they are properly isolated.

I'm not familiar with Quarkus recorders, but I think it should be okay to enhance current jberet model classes to meet Quarkus requirement. The main use case in Java batch is still based on xml job definition, with programmatic API to create job as minor one. I wouldn't worry too much about exposing any new constructor or methods to the users, if we can clearly document these methods. So overalll I think the more practical approach is to enhance the current job model, despite potential exposing unintended methods.

radcortez commented 3 years ago

As jboss-marshalling is used in other products, how do these products handles it when supporting Quarkus? Otherwise, we can implementing cloning in all affected classes and make sure they are properly isolated.

I don't know if we had this issue before. Maybe @geoand remembers something?

I'm not familiar with Quarkus recorders, but I think it should be okay to enhance current jberet model classes to meet Quarkus requirement.

As a simple way to state it, they generate the byte code associated with the runtime logic during deployment and then invoked at runtime. The key aspect here is that the recorded objects may require a no args contractor (we could override and use something else that is public as long as we have the data to call it) and public getters for all the required state.

So for instance in the case of Job#getTransitionElements this will fail in the Quarkus recorder because the Job implementation throws a IllegalStateException. On the other hand, it does seems wrong to just change that method to return null (or even an empty list) just because of Quarkus.

The main use case in Java batch is still based on xml job definition, with programmatic API to create job as minor one. I wouldn't worry too much about exposing any new constructor or methods to the users, if we can clearly document these methods. So overalll I think the more practical approach is to enhance the current job model, despite potential exposing unintended methods.

If you are fine to do the required changes by modifying the runtime model directly that will be great, since it would be a lot easier. Another alternative is to do it via builders, which I've suggested in https://github.com/jberet/jsr352/pull/132. Let me know which one do you prefer :)

chengfang commented 3 years ago

@radcortez I also feel directly modify the current model will be eaiser and IMO should be the preferred approach.

jmartisk commented 3 years ago

AFAIK no Quarkus extension depends on JBoss Marshalling. The typical use case of it is serializing objects for sending across the network, and Quarkus is HTTP&JSON focused so it's not really necessary. If we need to clone objects, I think we should implement our own cloning mechanism.

geoand commented 3 years ago

As @jmartisk says, we don't use JBoss Marshalling anywhere in Quarkus

radcortez commented 3 years ago

@radcortez I also feel directly modify the current model will be eaiser and IMO should be the preferred approach.

Great, let's go with that approach.

n1hility commented 3 years ago

Closing since this issue is partially obviated.