linkedin / linkedin-gradle-plugin-for-apache-hadoop

Apache License 2.0
117 stars 76 forks source link

Azkaban schedules and data-availability based triggers defined in HadoopDSL #194

Closed reallocf closed 6 years ago

reallocf commented 6 years ago

The Azkaban team is working on data-availability based triggers and will be launching that feature with HadoopDSL integration.

These dependencies will be written in .flow files that were introduced in Flow 2.0 (#193). This feature will not be available for users outputting .job/.properties files.

As a side-product of data-availability based triggers, this change will also allow HadoopDSL-defined schedules to be created.

In the future, we're aiming for all jobs and their associated schedules/triggers to be defined in .flow files. This will allow versioning of schedules, which hasn't been possible in the past.

Old .job/.properties files will still be able to be generated in the future for backward compatibility with older Azkaban versions, but won't have these new features.

@chengren311 is leading the data-availability based trigger project on the Azkaban team, and he should feel free to add more if he so chooses :smiley:

prokod commented 6 years ago

@reallocf, @chengren311 this is very exciting stuff! Could you please let users/followers know what is the status on that ? Specifically if I now pull Azkaban 3.43.0 and I deploy it, could I already generate flow 2.0 type flows in a manual way ? (probably only future versions of the Hadoop plugin will have this functionality as latest plugin was released in June 2017) How close are you to release data-availability based triggers ?

Kudos on the fine work

reallocf commented 6 years ago

Hey @prokod - I believe the data-availability based triggers are very close to working! I've been leading the work on implementing them here in the DSL. It expect it will take another few months before they're fully ready due to my attention being elsewhere as GDPR approaches.

@chengren311 should be able to give you more details about the data trigger release though! Last I heard it should be available in Azkaban within the next month :)

burgerkingeater commented 6 years ago

@prokod @reallocf the data trigger feature is under active development and will be released soon. Most part are done except some misc things. There's a wiki page(https://github.com/azkaban/azkaban/wiki/Azkaban-Data-Trigger) (more things to be added )about the feature if you are interested.

reallocf commented 6 years ago

Starting work on the DSL for this in earnest now. Will keep posted with updates here.

reallocf commented 6 years ago

This is completed and in the process of being rolled out internally :+1: Woo! :tada: