j-easy / easy-batch

The simple, stupid batch framework for Java
https://github.com/j-easy/easy-batch/wiki
MIT License
612 stars 199 forks source link

Job Chaining #66

Closed tusharbhasme closed 7 years ago

tusharbhasme commented 9 years ago

This is a very common job scenario where we want to run jobs in sequence/chain. Each job would define a set of requirements for it to execute eg, status of previous job in the chain. The report of the chain would contain the status of each job, no of jobs processed successfully, overall status of the chain, etc. This concept could easily be extended to create a chain of chain jobs.

fmbenhassine commented 9 years ago

Great idea indeed!

In addition to job chaining, branching and parallel execution are also common requirements, but implementing all this stuff would lead to developing a complete workflow engine to orchestrate easy batch jobs.

Another solution is to try to write plugins for existing open source workflow engines in order to not reinvent the wheel :-)

tusharbhasme commented 9 years ago

Totally agreed, but I am afraid adding everything can render it as NotSo-EasyBatch. :-)

IMO, chaining does not need much work and will keep it "Easy". It could be as simple as just decorating Engine, to an Engine containing an Engine.

AFA branching/parallel execution is concerned, right now I cannot think of any easy way to add it.

fmbenhassine commented 9 years ago

an Engine containing an Engine

Yeah, we can speak about easy batch inception :-)

inception

More seriously, as you said, developing the whole stuff would lead to a complex solution and this is not the DRY KISS philosophy behind easy batch ;-)

Chaining is not that difficult to implement, we could imagine a simple DSL to build job pipelines with the ability to start a job only if the previous one has finished/aborted under certain conditions. I think about something like:

JobPipeline jp = JobPipelineBuilder.aNewJobPipeline()
                                  .startWith(job1)
                                  .when(job1HasFinished).then(job2)
                                  .when(job2HasProcessed80PercentOfRecords).then(job3)
                                  .build();

Report finalReport = jp.run();

We can imagine a callback to be implemented by the user to specify in which condition start (or not) the next job in the pipeline:

public interface JobExecutionPredicate {
      boolean apply(final Report previousJobReport);
}

In the previous example, job1HasFinished and job2HasProcessed80PercentOfRecords implement this interface and job1, job2, job3 are easy batch instances.

It's just a first idea. Feel free to share your thoughts!

tusharbhasme commented 9 years ago

Very nice idea! But I still want to keep the inception idea and make

class JobPipeline implements Engine {

so that we can

JobPipelineBuilder.aNewJobPipeline()
                                 .startWith(JobPipelineBuilder.aNewJobPipeline().startWith(job1).then(job2).build)
                                  .when(predicate1).then(job2)
                                  .when(predicate2).then(job3)
                                  .build();

The information contained in Report of a JonPipelineEngine may be different than the information of an Engine. To pass the job information through predicate, we could even pass the whole job to the predicate:

JobPipelineBuilder.aNewJobPipeline()
    .when(EnginePredicate.Builder(JobPipelineBuilder.aNewJobPipeline().startWith(job1).then(job2).build()).build()).proceed()
.thenStart(EnginePredicate.Builder(job3).build())
.when(EnginePredicate.Builder(job4)).build()).proceed()
.thenStart(EnginePredicate.Builder(job5).build())
gs-spadmanabhan commented 9 years ago

Just curious, Is this something like mixing easy-rules with easy batch. Where when is nothing but evaluating a condition, then is nothing but executing an action upon condition returning true.

tusharbhasme commented 9 years ago

There is no mixing of easy-rules to easy-batch. when/then are just a logical method names for executing next steps on meeting a condition.

On Wed, Jun 17, 2015 at 2:24 PM Sunand P notifications@github.com wrote:

Just curious, Is this something like mixing easy-rules with easy batch. Where when is nothing but evaluating a condition, then is nothing but executing an action upon condition returning true.

— Reply to this email directly or view it on GitHub https://github.com/EasyBatch/easybatch-framework/issues/66#issuecomment-112726050 .

fmbenhassine commented 9 years ago

Hey guys, we've got a new concept: Meta-inception! An easy batch engine running inside another easy batch engine which in turn is running inside an easy rules engine which in turn is running inside a ... Ok, I stop, Just kidding :-)

@gs-spadmanabhan In fact, we can implement the JobPipeline idea using Easy Rules. Since Easy Rules triggers rules in sequence, it can be seen as a conditional pipeline. Just for the fun, I've implemented it here, and it works! What do you think?

@tusharbhasme Even though making JobPipeline implement the Engine interface is technically possible, I do believe they represent different concepts at different levels of abstraction. So mixing both concepts would be a bit confusing. Do you agree?

The most important part is to design a simple to read DSL to create the pipeline. The implementation itself is not that difficult (be it based on Easy Rules or not)

tusharbhasme commented 9 years ago

@benas I would still suggest to bring JobPipeline under Engine since JobPipeline is nothing but an engine running engines. The main reason behind this is that we can then easily create a chain job of chain jobs.

gs-spadmanabhan commented 9 years ago

@benas, I think the implementation makes sense, validates my theory ;-). As you said keeping DSL simple is the key whether it gets implemented in easy rules or not.

@tusharbhasme, I get that building under EasyBatch engine will result in less confusion, agreed. But I just proposed an alternative which is already existing.

What's running through my mind:

  1. Data Set - (CSV or DB) (A starting point)
  2. Apply(criteria)
    • Single Criteria - apply(criteria)
    • Conditional Criteria - apply(criteria1).and().apply(criteria2).or().apply(criteria3)
  3. Criteria result(true) then execute action1. Optionally an action can take filtered records resulting from a criteria as input then can further process the data.
  4. Criteria result(false) then execute action2.
  5. Action resulting in processed Data Set.
  6. Add additional actions if required action3, action4. For example, first process record, second send an email etc,.
  7. Repeat 2 to 6 to chain jobs with criteria/condition.

To make this possible merging 2 frameworks seems to be a good idea. Even though the 2 frameworks represents different concepts, but I see value in combining them both as a single framework. That's all folks my rambling is over. Thoughts??

tusharbhasme commented 9 years ago

Hey @benas, any updates over this requirement? It would be great if we could schedule the chain too.

MALPI commented 9 years ago

Hey,

I like this, since I had nearly the same Request.

@tusharbhasme I wouldn't mix in scheduling in here since there are many frameworks which provide easy scheduling. We achieved this by using Spring Scheduling.

@benas I really like the easyRule approach! This is definitely what I'm looking for.

BR

fmbenhassine commented 9 years ago

Hello guys,

Now that version 4 is out (and what a release! Easy Batch has never been easier to understand and use), we can forget about the term engine and all confusions it brought. Easy Batch engine has been renamed to Job (issue #141 ), this is less confusing and more natural name.

There are the 4 key concepts around jobs:

I was thinking about a new concept JobOperator or JobOrchestrator, inspired by JSR 352, section 7.3, which would be responsible for orchestrating jobs: chaining, branching, etc. To my opinion, as discussed in #128, it would be a kind of a "super" ExecutorService that can start/stop/cancel jobs as well as orchestrate them (conditional execution, chaining, etc) What are your thoughts on that?

I didn't found a open source workflow engine that can orchestrate plain java.util.concurrent.Callable objects. Do you know such a workflow engine?

Regards Mahmoud

gs-spadmanabhan commented 9 years ago

Hi there,

First of all Congratulations and I am excited to see the changes in version 4, I haven't gone through the code base, but I will find time to go through.

I don't have much idea of workflow engine, the ones I have heard are jBPM and activiti. I don't know the internals of those framework since they are not easy :) all these pretty much deals with lot of XML Configuration.

MALPI commented 9 years ago

Hi, Congratulations for the new Release.

Can you explain why you want to integrate an workflow engine?

To stop and start a Callable you can just use plain Java methods. To have the Preconditions I'd go for the easy rule solution.

The hard part would possibly be the branching and chaining.

fmbenhassine commented 9 years ago

Hi,

Thank you!

Can you explain why you want to integrate an workflow engine?

The goal is to orchestrate jobs to create complex workflows (branching & chaining). Easy Batch is designed to create and execute jobs but not to orchestrate them.

I was reading a interesting discussion on spring Batch forum where a user asked for how to deal with complex workflows of branching/chaining with Spring Batch. The project lead recommended to not use Spring Batch for job orchestration because it was not designed for that. I totally agree with him and this is also the case for Easy Batch.

To stop and start a Callable you can just use plain Java methods.

Sure! This what we discussed together in issue #128 :

"A job is a unit of work that can be submitted to an executor service which is responsible for it's life cycle (start, stop, cancel, calculate progress, handle timeout, etc)." So yes we can do it in plain Java.

But a the idea is to have a "super enhanced" ExecutorService that provides a DSL to do more than just start/stop/cancel jobs, but to orchestrate them, hence the proposed name JobOrchestrator.

A good example is Flo for Spring XD , (demo). Do you see the idea?

@gs-spadmanabhan Thx! I was aware of jBPM and activiti. I agree with you, not so easy ..

MALPI commented 9 years ago

Yeah sure got that @benas, but what I mean is, it's basically about to create kind of workflows. Basically I'd go for implementing it by workflow design patterns rather then integrating another framework.

Therefore I would start to identify the usecases and limit it to them in the first run. For example chaining and branching.

fmbenhassine commented 9 years ago

Yeah sure, integrating with an existing workflow engine is one possibility among others, in this issue we are trying to find the best way to implement Job orchestrating with a KISSable approach :wink:

Currently, here are the options:

  1. Implement a DSL (like shown above, with easy rules or something else): Simple to implement chaining, harder when it comes to branching parallel jobs
  2. Integrate with existing workflow engine (jBPM, activiti, etc) : Would be easy to create complex workflows graphically, but clearly requires a lot of effort

I think we can stick and start with a very basic approach for chaining as requested at first by @tusharbhasme and provide something like:

JobPipeline jp = JobPipelineBuilder.aNewJobPipeline()
                                  .startWith(job1)
                                  .when(job1HasFinished).then(job2)
                                  .when(job2HasProcessed80PercentOfRecords).then(job3)
                                  .run();

After all, Easy Batch jobs are simple pipelines of record processors, so why not take the idea to the upper abstraction level (Job) and introduce JobPipeline. This will exactly implement the requested feature JobChaining (title of this issue).

Do you agree?

gs-spadmanabhan commented 8 years ago

Any updates?

fmbenhassine commented 8 years ago

Nope, I didn't have time to work on this feature.

gs-spadmanabhan commented 8 years ago

Why don't we create an EasyWorkflow project ourselves?

fmbenhassine commented 8 years ago

Hi,

Really sorry for not giving any update on this, I'm a bit in this situation ..

Good idea indeed! This would be great community driven effort to add flows support to Easy Batch. I would name it EasyFlow: a simple, stupid workflow engine for Easy Batch :smile: don't hesitate to suggest ideas or a working prototype, you are very welcome.

Thank you Sunand! Best regards Mahmoud

fmbenhassine commented 8 years ago

Sunand,

I've published a unfinished working prototype (that I've stashed a few months ago) in branch feature-66. You can already build job pipelines. See example here. There are some built-in predicates already.

The API is under the org/easybatch/core/flow package. It can be a good starting point for you.

Looking forward for your feedback.

Kind regards Mahmoud

fmbenhassine commented 8 years ago

@tusharbhasme @MALPI

I've published version 4.1.0-SNAPSHOT with a working prototype of the JobPipeline API. Currently, it supports job chaining as requested. See example here.

Don't hesitate to give it a try. Looking forward for your feedback and suggestions.

Kind regards Mahmoud

tusharbhasme commented 8 years ago

Awesome!!!

On Tue, May 17, 2016 at 6:09 PM Mahmoud Ben Hassine < notifications@github.com> wrote:

@tusharbhasme https://github.com/tusharbhasme @MALPI https://github.com/MALPI

I've published version 4.1.0-SNAPSHOT with a working prototype of the JobPipeline API. Currently, it supports job chaining as requested. See example here https://github.com/EasyBatch/easybatch-framework/blob/feature-66/easybatch-core/src/test/java/org/easybatch/core/flow/JobPipelineTest.java#L27 .

Don't hesitate to give it a try. Looking forward for your feedback and suggestions.

Kind regards Mahmoud

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/EasyBatch/easybatch-framework/issues/66#issuecomment-219704993

gs-spadmanabhan commented 8 years ago

@benas Apologies for the delay in my response, this version of what you have put up is really good. But I still feel a job should have 2 states onSuccess and onFailure.

Solution

{
  "name" : "WF1",
  "description" : "Test Workflow Spec",
  "type" : "DIRECT",
  "jobs" : [ {
    "jobName" : "Job1",
    "jobDescription" : "Do First Job",
    "parameters" : null,
    "onSuccess" : [ "Job2" ],
    "onFailure" : [ "Job3" ],
    "onCompleted" : null,
    "position" : 0
  }, {
    "jobName" : "Job2",
    "jobDescription" : "Do Second job if the first job succeeds",
    "parameters" : null,
    "onSuccess" : null,
    "onFailure" : [ "Job3" ],
    "onCompleted" : null,
    "position" : 1
  }, {
    "jobName" : "Job3",
    "jobDescription" : "If this executes then first job has failed",
    "parameters" : null,
    "onSuccess" : null,
    "onFailure" : null,
    "onCompleted" : null,
    "position" : 2
  } ]
}

I am trying to put up some sample code but its taking time. Thoughts, suggestions welcome!

fmbenhassine commented 8 years ago

Hi Sunand,

This is a GREAT idea. I was also thinking of such kind of DSL (inspired by jenkins pipeline DSL). But not that easy to implement, you may agree. As you said, the JobPipelineBuilder API I provided should add an "else" method to handle both states of the job. I'll see how to add this.

Another way to design job pipelines is something like: job1 && job2 || job3. This is inspired by unix job control syntax. What an elegant syntax! My idea was to use an expression language (MVEL, Spring EL, or whatever). If only I had time to work on this issue all the day long :rage:

Regards Mahmoud

gs-spadmanabhan commented 8 years ago

189 Check out this PR and provide suggestions.

fmbenhassine commented 8 years ago

Hi Sunand,

I've tried to add an else in the DSL to be able to write something like:

JobPipeline jobPipeline = aNewJobPipelineBuilder()
                .startWith(job1)
                .when(predicate1).then(job2).else(job3)
                .when(predicate2).then(job4).else(job5)
                .when(predicate3).then(job6).else(job7)
                .build();

Even with this, it is NOT possible to achieve a comprehensive flow like I was expecting:

image

The reason is that the predicate defined by the user is applied to the last executed job, which is unknown at runtime with this new branching model. In the example above:

So the syntax is not correct and does not lead to the expected graph.

I do believe the best approach is to have a real graph of jobs like you did in your PR. But this is a lot of work and I really appreciate your effort. As you said, this is actually the scope of another project. I saw you already prepared a repo for that :wink: So I propose you lead the development of the solution (in PR #189) as a separate project and I will do my best to contribute. What do you think?

My attempt to provide a JobPipeline API does work only for sequential job execution (hence the name pipeline in the API, or else it would be JobFlow):

image

The predicate is the condition to continue to next job in the pipeline, otherwise, next jobs are skipped. Pretty basic, but as discussed, it is a first step toward implementing job chaining like requested first by @tusharbhasme and @MALPI .

Cheers Mahmoud

gs-spadmanabhan commented 8 years ago

Hi Mahmoud,

Thanks for taking time in evaluating the proposal. I did create the repo but then I thought it really required all the Job JobReport JobResult JobParameters classes (with job class being able to take a certain definition) on top of which this graph can sit and dictate the workflow. If I write it separately will I be able to use it with EasyBatch?

I got really confused. But I will try to write using these classes then again it will have slow progress due to my current job.

Thanks again for validating the same.

Sunand

fmbenhassine commented 8 years ago

Hi Sunand,

Just add the easybatch-core module as dependency in your project and you can use these APIs, they are public. This is how I developed all extensions.

The example of DAG in my last comment is easy to create with your approach and not even possible with the JobPipeline API I've introduced. So as I said, your approach is the way to go (taking into account the couple of notes we've discussed in #189 ).

it will have slow progress due to my current job.

Same here. I'm really really afraid to not be able to work full time on this. But don't worry, take your time and keep me informed when you are ready, I'll be very happy to give you credits on that effort!

Best regards Mahmoud

fmbenhassine commented 8 years ago

@tusharbhasme @MALPI Have you got a chance to test this feature?

Would love to get your feedback

tusharbhasme commented 8 years ago

Hey Mahmoud,

I am now not a part of the project that needed it but I am glad this feature has been added. I will still try to test this feature with code I have and pass this info to the team working on it!

Thanks, Tushar Bhasme

On Fri, Jun 17, 2016 at 1:23 PM, Mahmoud Ben Hassine < notifications@github.com> wrote:

@tusharbhasme https://github.com/tusharbhasme @MALPI https://github.com/MALPI Have you got a chance to test this feature https://github.com/EasyBatch/easybatch-framework/issues/66#issuecomment-219704993?

Would love to get your feedback

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EasyBatch/easybatch-framework/issues/66#issuecomment-226704485, or mute the thread https://github.com/notifications/unsubscribe/AH9qnWhJpvQaxOaitSXVGPG2ZB3FVSyRks5qMlJigaJpZM4E9bI4 .

fmbenhassine commented 7 years ago

Hi @tusharbhasme @gs-spadmanabhan @MALPI

Finally I was able to release easy-flows. It provides all what we discussed here (chaining, branching, etc) easily 😉

I didn't found a open source workflow engine that can orchestrate plain java.util.concurrent.Callable objects. Do you know such a workflow engine?

Easy Flows is what I didn't found after a lot of search on the net. I really don't understand why every single workflow engine out there is trying to implement BPMN? There is nothing wrong with this notation, but it is not simple ( 538 pages spec?? ) and getting started is not easy with current defacto engines.

Anyway, Easy Batch jobs are callable objects and can be orchestrated with Easy Flows. All projects of jeasy are designed to work well together.

Let me go back to first comment of this issue.

Each job would define a set of requirements for it to execute eg, status of previous job in the chain.

In Easy Flows, a WorkReportPredicate is what you are looking for.

This concept could easily be extended to create a chain of chain jobs.

A WorkFlow in Easy Flows extends Work concept, so flows are composable by design.

I hope this new library helps the community.

I'm closing this issue for now.

Kind regards Mahmoud