spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.11k stars 583 forks source link

Enable batch jobs to be queued for single instance jobs #3544

Open venkatasreekanth opened 5 years ago

venkatasreekanth commented 5 years ago

Problem description: We have continuous updates coming from systems and all of them invoke the same task. The amount of time it takes for the batch task to complete depends on the size of the data. The task is single instance enabled. When subsequent tasks are launched they immediately fail as an existing job is running.

Solution description: The solution to this problem is queuing up the jobs and this way the max allowed jobs will run and next ones can be launched as soon as the slot opens up.

Description of alternatives: The alternative approach is to use the API to get tasks and check for their status and also monitor task_lock table. Once we are clear on both launch the subsequent job, int the mean while hold the job configuration in AMQP or some other location. This approach is heavily tied into understanding of how SCDF manages jobs and if SCDF changes it approach this process fails

Additional context: Add any other context or explanation about the feature request here.

mminella commented 4 years ago

@venkatasreekanth Can you elaborate on why you are using the single instance feature in this use case? We are trying to understand why

venkatasreekanth commented 4 years ago

@mminella Like I described in the problem description, the updates from the MDM system are out in 5 min or 1 min intervals, The size of the data coming out can range few hundred updates to a million plus. Let's says the system sends out a 600k update and follows it with a 50k update. Assume that a non-stocking or denied in countries flag is being set. Assume that the same record appears in both updates but has updated information in the second update.

If I don't use single instance, there is chance of the record being posted first by the 50k job and then being updated by older 600k job data in downstream systems. Any other solution introduces far too much complexity.

sabbyanandan commented 3 years ago

@mminella: What are your thoughts?