mrpaulandrew / procfwk

A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.
https://mrpaulandrew.com/category/azure/data-factory/adf-procfwk/
Other
185 stars 116 forks source link
adf adfprocfwk azure azure-functions azure-sql-database data-engineering data-factory framework metadata pipelines processing procfwk

Read Me - Orchestrate.procfwk

For complete documentation on this solution see procfwk.com.

ProcFwk Has Become CF.Cumulus.Control

See blog: mrpaulandrew.com

See new product page: cloudformations.org/cumulus

ProcFwk will receive no further development beyond December 2023.

Framework Capabilities

Complete Data Factory Activity Chain

Issues

If you've found a bug or have a new feature request please log the details using the repository issues.

Go to... Issues

Projects

Go to... External Requests

Go to... Internal Backlog

Release Details

Version Overview Version Details & Release Notes
2.0 Azure Synapse Analytics fully supported as an interchangeable orchestrator of pipelines within the procfwk. GitHub Pages:
Orchestrators
Orchestrator Types

Release Summary Video:
YouTube - procfwk Playlist

GitHub Issues:
procfwk #95
2.0-beta Azure Synapse Analytics Beta support added.

Development of Azure Functions App completed using the Synapse namespace: Azure.Analytics.Synapse.Artifacts with version 1.0.0-beta.1 of the NuGet package.
GitHub Issues:
procfwk #21
1.9.2 Batch Executions added, plus:
  • Exception Pipeline
  • Running Pipeline Check
  • Pipeline Parameter Last Values
  • Worker Pipeline Validation
GitHub Pages: Batch Executions

Release Demo Summary Video: YouTube - procfwk Playlist

GitHub Issues:
procfwk #78
procfwk #77
procfwk #71
procfwk #73
procfwk #80
procfwk #72
1.9.1 Activity Policy Update, plus:
  • Secure Activity Inputs/Outputs.
  • Execution Wrapper Hardening.
  • New Activity Icons and Framework Factory Cosmetics.
GitHub Issues:
procfwk #65
procfwk #66
procfwk #67
procfwk #69
1.9.0 Cross Tenant & Subscription Support added, plus:
  • New integration tests created.
  • Infant pipeline refactoring.
  • tSQLt project added.
GitHub Issues:
procfwk #34
procfwk #35
procfwk #46
procfwk #55
procfwk #56
procfwk #59
1.8.6 Pipeline Expressions Refactored to Use Variables added, plus:
  • New integration tests created.
  • Complete activity chain redrawn in Visio.
GitHub Issues:
procfwk #51
procfwk #52
1.8.5 Execution Precursor added, plus:
  • PowerShell helper to add initial Worker metadata.
procfwk v1.8.5 - Execution Precursor
1.8.4 Database Schema Reorganise and Restructuring procfwk v1.8.4 - Database Schema Reorganise and Restructuring
1.8.3 Bug Fixes from the Community, including:
  • Email alerts sent to blank email addresses due to wrong flow in Child pipeline.
  • Worker pipelines cancelled during an execution fail when the framework is restarted due to missing Parent pipeline clean up condition.
GitHub Issues:
procfwk #38
procfwk #37
1.8.2 Optionally Store SPN Details in Azure Key Vault procfwk v1.8.2 - Optionally Store SPN Details in Azure Key Vault
1.8.1 Automated Framework Pipeline Testing added, including tests for:
  • A simple grandparent run.
  • All types of failure dependency handling.
  • Metadata checks when pipelines and staged are disabled.
  • No pipeline parameters provided.
Blog Series:
  1. Set up automated testing for Azure Data Factory
  2. Automate integration tests in Azure Data Factory
  3. Isolated functional tests for Azure Data Factory
  4. Testing Azure Data Factory in your CI/CD pipeline
  5. Unit testing Azure Data Factory pipelines
  6. Calculating Azure Data Factory test coverage
1.8.0 Complete Pipeline Dependency Chains For Failure Handling added, plus:
  • Clean up of a previous execution run if Workers appear as running.
  • New metadata integrity checks.
  • Internal get property value function added.
procfwk v1.8 - Complete Pipeline Dependency Chains For Failure Handling
1.7.3 Data Factory Deployment Updated To Use azure.datafactory.tools PowerShell Module SQLPlayer/azure.datafactory.tools
1.7.2 Pipeline Parameter NULL Handling added, plus:
  • Worker pipelines with a status of 'Running' protected from a new execution start/restart.
procfwk v1.7.2 - NULL Pipeline Parameters Handled
1.7.1 Alerting Check Bug Fix added, plus:
  • Pipeline parameter value size limit removed.
procfwk v1.7.1 - Alerting Bug Fix And Pipeline Parameter Size Limit Removed
1.7.0 Pipleline EMail Alerting added, plus:
  • Send email Function implemented and hardened.
  • Handy Notebook updates.
  • Activity failure paths improved.
  • MIT license and code of conduct added.
  • Error table bug fix. Error code attribute; INT to VARCHAR
procfwk v1.7 - Pipeline Email Alerting
1.6.0 Error Details for Failed Activities Captured, plus:
  • Pipeline parameters used at runtime captured in execution logs.
  • Emailing Function added, not yet implemented.
  • Unknown Worker outcomes optionally blocks downstream stages.
  • Solution housekeeping.
procfwk v1.6 - Error Details for Failed Activities Captured
1.5.0 Power BI Dashboard for Framework Executions, plus:
  • Worker Parallelism View.
  • Pipeline Run ID now logged.
  • Logging Attributes Bug Fix.
procfwk v1.5 - Power BI Dashboard for Framework Executions
1.4.0 Enhancements for Long Running Pipelines, plus:
  • Pipeline check status function added.
  • Function Data Factory client moved to internal class.
  • SQL GETDATE() changed to GETUTCDATE().
  • Glossary created, here.
  • Updated database views.
procfwk v1.4 - Enhancements for Long Running Pipelines
1.3.0 Metadata Integrity Checks, plus:
  • Logical pipeline predecessors.
  • Data Factory Powershell deployment script.
  • Helper Notebook.
  • Database objects renames and solution tidy up.
procfwk v1.3 - Metadata Integrity Checks
1.2.0 Execution Restartability, plus:
  • Data Factory annotations and descriptions.
  • Database covering indexes.
  • Pipeline log status changed from 'Started' to 'Preparing'.
  • Pipeline log start date/time now set in child pipeline.
procfwk v1.2 - Execution Restartability
1.1.0 Service Principal Handling via Metadata, plus:
  • Data Factory table.
  • Properties table and view.
  • Function body bug fix.
  • New sample data.
procfwk v1.1 - Service Principal Handling via Metadata
1.0.0 Simple framework designed and base compontents built.
  • Part 1 - Design, concepts, service coupling, caveats, problems.
  • Part 2 - Database build and metadata.
  • Part 3 - Data Factory build.
  • Part 4 - Execution, conclusions, enhancements.
Blog Series:
Creating a Simple Staged Metadata Driven Processing Framework for Azure Data Factory Pipelines