linkedin / linkedin-gradle-plugin-for-apache-hadoop

Apache License 2.0
117 stars 76 forks source link

Handling emergent flows for Flow 2.0 #221

Closed reallocf closed 6 years ago

reallocf commented 6 years ago

In Flow 1.0, we had an interesting property (called emergent flows) where users could write .properties files to a separate directory than the original workflow directory during compile time, then include them both at zip time to create the workflow that is ultimately uploaded. This was nice because you could specify multiple zip files and pick different properties for each zip file in order to have different properties on different clusters. This was a heavily used mechanism for having cluster-specific properties at LinkedIn.

For Flow 2.0 this is less straightforward because the Yaml object is completely generated at compile time. So this PR introduces a similar mechanism to allow properties to be merged into flows at zip time.

The way it works is like this:

If a namespace has NO workflows defined but DOES have properties defined, it writes those properties to a .tempprops file. Properties defined with workflows are expected to be workflow properties.

At zip time, if the zip contains any .tempprops files, it reads them in then reads in all the .flow files and merges them together. It then creates new .flow files with the resulting properties included. The files are named _.flow so each zip will create a uniquely named .flow file. The original .flow files and .tempprops files are then not included in the zip while the new .flow files are.

This should create backward compatibility with Flow 1.0 for flows that leverage the emergent flows property.