Closed mrpaulandrew closed 4 years ago
Hi @mrpaulandrew,
I too had a similar requirement to support multiple parallel executions. I approached it similarly except that I called the higher-level concept as "Job".
A Job is comprised of one or more stages. Multiple jobs can run concurrently. JobId is passed a parameter from the Parent pipeline's trigger and is made available to all lower levels. A combination of pipelines linked to all stages in a job is inserted into LocalExecution. Once the job is successful, logs related to a job are moved to ExecutionLog
Metadatadb changes include
Examples of Jobs:
Daily Sales Ingestion Job (JobId 1)
Daily Marketing Egression Job (JobId 2)
Weekly Campaign Ingestion Job (JobId 3)
I didn't have much use for Grand Parent at my work setup. So, my idea was to have the ParentPipeline to have different triggers (schedule or tumbling) and to pass the JobId as a parameter to the pipeline
When a worker pipeline is reused in a stage within the same job or across jobs, for example, LaunchDatabricksJob is a pipeline that gets reused a lot at my work where I need to pass different invocation params, class name and jar name as well to submit different apps/jars on to Databricks cluster. To do that, I just add a new entry into Pipelines table mapping the Stage and Pipeline (like a new pipeline instance) and that causes a new PipelineId to be created making this particular pipeline instance unique. This PipelineId will be used in the Pipeline Parameters table to associate parameters needed for that particular pipeline id (or pipeline instance).
Examples of Jobs:
DailyTrigger-SalesIngestion => Scheduled At 3 AM Daily in AUS Time Zone => Parameter: JobId = 1
DailyTrigger-MarketingEgression => TumblingWindow trigger Scheduled at 4 AM Daily UTC => Parameter: JobId = 2
WeeklyTrigger-CampaignIngestion => Scheduled At 3 AM On Mondays in AUS Time Zone => Parameter: JobId = 3
I had this as a work-in-progress project in my local for some time, but after looking at recent conversations around this issue, I have pulled recent changes from the master branch, applied my changes (only metadata database changes) and pushed them to my fork. Planning to work on ADF Pipeline changes tonight and give a test end to end
Please have a look at the changes and let me know if it helps - https://github.com/Mallik-G/ADF.procfwk/tree/feature/mallik
Thanks, Mallik
@mrpaulandrew I have pushed my implementation of this onto my fork here: https://github.com/NJLangley/ADF.procfwk/tree/feature/batches
There are a few outstanding bits to fix:
Let me know if you have questions / any issues testing it
Building on community feedback in the following issues:
https://github.com/mrpaulandrew/procfwk/issues/61 https://github.com/mrpaulandrew/procfwk/issues/62
Allow the framework to support multiple executions of the parent pipeline using a new higher level concept called Batches.
Examples of Batch names:
Finally, it is expected that a Trigger hitting the parent pipeline will have the batch name included.