linkedin / linkedin-gradle-plugin-for-apache-hadoop

Apache License 2.0
117 stars 76 forks source link

Azkaban CLI features #107

Closed pranayhasan closed 8 years ago

pranayhasan commented 8 years ago

Create Project, Flow Status, Execute Flow and Cancel Flow

pranayhasan commented 8 years ago

Want to know if there are any quick additions/IO changes. Would cleanup once the view is finalized.

convexquad commented 8 years ago

@pranayhasan Since this is a larger RB, give me a few more days to work on the review

convexquad commented 8 years ago

@pranayhasan @nntnag17 I want to ask that you make a major design change with this PR. I don't want to repeat the same mistake as with the azkaban-cli (internal to LinkedIn) multiproduct, where it is hard to re-use the code because it assumes that you are interacting with it in a specific way (just to note, I think azkaban-cli is written well except that it is hard to re-use since it assumes you are interacting with it via the console).

I would like to ask you to create a new Gradle subproject for interacting with Azkaban that just contains a series of library methods with high-level API's, e.g. createAzkabanProject(AzkabanProject azkabanProject, String sessionId) that don't assume anything about Gradle or the Hadoop DSL / Plugin.

Then leave all the Gradle tasks in place in the hadoop-plugin project. Probably there will still be the same classes that there are now (AzkabanCreateProjectTask, etc.), however instead of having the code in them that creates the project, they should call the API from the other subproject.

There will obviously be tons of re-factoring in order to make these changes. @pranayhasan This will be a good design challenge for you in determining how to best balance the refactoring. You can't just move everything to the new Azkaban subproject as some things want console input. You'll need to keep the console input parts back in the hadoop-plugin subproject that has all the Gradle tasks. Additionally, you probably want the new high-level API to take in a logger object, so that you can pass in the Gradle logger into the API and do logging in a re-usable manner there.

convexquad commented 8 years ago

Second comment - can you include as many unit tests as you can for any standalone helper functions (i.e. the functions that don't make remote calls to Azkaban). For the functions that actually call Azkaban, testing will be more difficult. We should develop tests for these methods, but I think we can do it as a separate PR.

pranayhasan commented 8 years ago

Thanks @convexquad for the recommendations. I agree with the concerns raised by you, as I haven't been able to reuse the azcli project though it is inherently calling the same API. Would work on refactoring to make the API calls independent to that of hadoop-plugin. I will continue to commit on the existing PR and ping you once I'm done.

convexquad commented 8 years ago

@pranayhasan Do you have an ETA on the refactoring work for this Pull Request? It would be an awesome enhancement for the Hadoop Plugin

pranayhasan commented 8 years ago

@convexquad I had other prioritized work, so had to put this on backlog. I started refactoring it. You can expect updated PR by early next week.

pranayhasan commented 8 years ago

Opened a new PR discarding this.