kubeflow / common

Common APIs and libraries shared by other Kubeflow operator repositories.
Apache License 2.0
51 stars 73 forks source link

feat(common) add reconciler.v1 #141

Closed zw0610 closed 3 years ago

zw0610 commented 3 years ago

140

The common repository offers a controller.v1 package for developers to make kubeflow operators. The controller mode exposes low-level APIs for operators needs high customization requests, which are more about the controller mechanism, such like pre-processing/post-processing for enqueue/dequeue actions.

However, for more entry-level developers or users who may not that familiar with controller, exposing to many low-level APIs could be confusing. Meanwhile, with effort from controller-runtime, kubebuilder and operator-sdk, the reconciler mode is offering a high-level, controller-mechanism-decoupled APIs which are only related to how to meet the resources expectation defined in the declarative API like TFJob, MXNetJob, etc.

To add the reconciler.v1 package as long as to keep code re-use between the controller.v1 and reconciler.v1 packages, core package is extracted to be shared between reconciler.v1 and controller.v1.

The reconciler.v1 defines its API in reconciler.v1/common/interface.go which methods only related to reconciliation exposed.

A base/parent implementation named KubeflowReconciler is defined so developers only need to override methods in reconciler.v1/common/must_customize.go.

Jeffwan commented 3 years ago

Let's try to minimize the code changes. If pod.go services.go are reusable, then let's try not to replica the changes

Jeffwan commented 3 years ago

/hold

zw0610 commented 3 years ago

Let's try to minimize the code changes. If pod.go services.go are reusable, then let's try not to replica the changes

sure. I shall finish the gang part and try to maximize code re-use in the next commit.

zw0610 commented 3 years ago

@Jeffwan While I'm still working on adding comment and unit-tests for reconciler.v1 package, you may take a look at this pull request and let me know any concern to the changes especially to the existing controller.v1 package.

Let me know if I need to rebase it first.

Jeffwan commented 3 years ago

/cc @kubeflow/wg-training-leads Please help review the proposal.

johnugeorge commented 3 years ago

In general, do we need to low level api code in longer run? Can we discuss in next meeting before merge?

Jeffwan commented 3 years ago

@johnugeorge Sure, this is a big change requires user (operator) facing changes. Let's have an agreement in the meeting first.

Jeffwan commented 3 years ago

/hold

zw0610 commented 3 years ago

In general, do we need to low level api code in longer run? Can we discuss in next meeting before merge?

I think the Reconciler struct exposes limited APIs that are sufficient for most operators development. While there might always be some customization supported only in Controller mode, especially for users with private deploy, these two mode should be

The concern on keeping both reconciler.v1 and controller.v1, I believe, is about the effort to maintain features and bug-fix for both package. I would suggest the following steps:

  1. add reconciler.v1 package to kubeflow/common repository
  2. convert controllers in kubeflow/tf-operator based on kubeflow/common reconciler.v1
  3. deprecate controller.v1

Because the scheduling for the community meeting next week (August 11/12th) is not so friendly to my local time, I can attend if the discussion is quite critical. Otherwise, could we discuss on August 25th?

@johnugeorge Sure. Let's discuss this on the community meeting on August 11th.

gaocegege commented 3 years ago

/cc @kubeflow/wg-automl-leads

gaocegege commented 3 years ago

/cc @alculquicondor

alculquicondor commented 3 years ago

This is a long PR. Would it be worth splitting the go.mod upgrades in its own commit or PR?

Any specific parts were you would like my input, given that I'm still unfamiliar with this project?

Jeffwan commented 3 years ago

The scope and codes overall looks good to me. Other leads, please leave comments.

/cc @kubeflow/wg-training-leads

Jeffwan commented 3 years ago

Overall looks good to me. I think this is safe to merge and it doesn't introduce user facing low level JobController. (just some reorganization to extract common codes).

If we plan to support a CustomJob which can support arbitrary roles. reconciler.v1 would be a good start. For existing frameworks, we can make some plans later. This is not a blocking issue to 1.4 release and beta release of all-in-one operator.

We can probably introduce this to v2 apis & controllers.

@kubeflow/wg-training-leads Please have a double check. Let's hold it for two days.

johnugeorge commented 3 years ago

I agree with @Jeffwan For existing frameworks this is not necessary at this point. We can plan for it at a later point

Jeffwan commented 3 years ago

/lgtm /approve

google-oss-robot commented 3 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/common/blob/master/OWNERS)~~ [Jeffwan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Jeffwan commented 3 years ago

/hold cancel