Allow generateYamlOutput to be defined multiple times

linkedin / linkedin-gradle-plugin-for-apache-hadoop

Apache License 2.0

117 stars 76 forks source link

Allow generateYamlOutput to be defined multiple times #217

Closed reallocf closed 6 years ago

jamiesjc commented 6 years ago

Is it possible that the flag was not cleaned up when the user first compiled his project and it went into some weird state, and then the next time when he complied, it threw that scope binding error? Didn't see unbinding of that flag in com.linkedin.gradle.hadoopdsl.BaseNamedScopeContainer#clear.

reallocf commented 6 years ago

After looking extensively + testing leveraging the BaseNamedScopeContainer clear method it looks like this won't work.

The only time this method is used is via the HadoopDslAutoBuild mechanism - a tool used to allow users to define different workflow configurations for multiple clusters. The problem is that this tool adds the workflows to the namespace, then clears the state, THEN builds the workflows. The result of this being that, if the generateYamlOutput state is cleared, flow 2.0 is not applied.

I think this original solution is the best way forward. You can argue this makes the most sense from a design perspective as well.

jamiesjc commented 6 years ago

Thanks for the investigation and detailed explanation!

jamiesjc commented 6 years ago

Can you also document the findings and reasons for the new commit?

reallocf commented 6 years ago

Sure.

The second commit was to have namespaces create subdirectories in the DSL. This allows the HadoopDslAutoBuild to work as expected. Before, .flow files were all written to the project root, but this doesn't work because then for each cluster the .flow file is overwritten. This doesn't achieve the desired behavior (different configs in each zip) and actually causes failures because the .flow files are set to read-only.

Now, .flow files are written into their namespace directories. Thanks to your work @jamiesjc this should still upload to Azkaban fine even if it isn't moved into the root of the zip at zip time (like it is for many projects that use the HadoopDslAutoBuild).

I also added a test case to confirm that namespaced workflows are generated as expected.