As Azkaban rolls out data-availability based triggers, we're updating the DSL so it can generate these triggers.
Note: Since triggers are flow-level, it requires the new Flow 2.0 yaml serialization. You can instruct the DSL to generate yaml by specifying generateYamlOutput true in your hadoop closure.
This PR provides a full data triggers implementation. It introduces the Trigger, Schedule, and TriggerDependency objects.
Trigger objects can be created in any NameScope, but will only be picked up to be output if defined in (or added to) a Workflow. An error will be thrown if more than one Trigger is defined in a Workflow.
Schedule objects are created inside of Trigger objects and specify the cron value corresponding to when the Trigger will be created in Azkaban. An error will be thrown if more than one schedule is created in a Trigger OR if no schedule is created in a trigger.
The TriggerDependency abstract object is built to be implemented by many classes so the interface is pluggable. As of this PR there is only one implementation corresponding with the one implementation available in Azkaban when Data Triggers are released - the DaliDependency. Unfortunately, this is a closed-source dependency, so it doesn't offer a great deal of value to the open source community. When the HDFS Dependency is introduced in Azkaban, a corresponding TriggerDependency will also be created here.
Other than the introduction of these objects and the logic to yamlize them in the AzkabanDslYamlCompiler, this PR is primarily checker logic and unit and integration tests. The TriggerChecker is implemented to make sure that Triggers/Schedules/TriggerDependencies are properly defined. The unit tests confirm that the AzkabanDslYamlCompiler compiles the new objects properly. And the integration tests test confirm that the DSL is generating Triggers in Yaml in an expected way and that the TriggerChecker picks up improperly defined Triggers.
As Azkaban rolls out data-availability based triggers, we're updating the DSL so it can generate these triggers.
Note: Since triggers are flow-level, it requires the new Flow 2.0 yaml serialization. You can instruct the DSL to generate yaml by specifying
generateYamlOutput true
in your hadoop closure.This PR provides a full data triggers implementation. It introduces the Trigger, Schedule, and TriggerDependency objects.
Other than the introduction of these objects and the logic to yamlize them in the AzkabanDslYamlCompiler, this PR is primarily checker logic and unit and integration tests. The TriggerChecker is implemented to make sure that Triggers/Schedules/TriggerDependencies are properly defined. The unit tests confirm that the AzkabanDslYamlCompiler compiles the new objects properly. And the integration tests test confirm that the DSL is generating Triggers in Yaml in an expected way and that the TriggerChecker picks up improperly defined Triggers.
More info in #194