kubeflow / mxnet-operator

A Kubernetes operator for mxnet jobs
Apache License 2.0
53 stars 34 forks source link

Migrate controller implementation to kubeflow/common fashion #76

Closed Jeffwan closed 4 years ago

Jeffwan commented 4 years ago

Major changes include

  1. Bump Kubernetes dependency version to 1.16.9
  2. Simplify v1 APIs by leverage common types in kubeflow/common/pkg/apis/common/v1
  3. Replace exiting implementation with kubeflow/common/pkg/controller.v1/common and get ride of tf-operator/pkg/common/jobcontroller
  4. Replace batch scheduler from kube-batch to volcano (this is alone with kubeflow common upgrade)
  5. Remove MXReplicaSpec.Label and use replica.PodTemplate.Annotation[mxJobTunerServerKey] instead. This is used for tvm auto-tuning.
kubeflow-bot commented 4 years ago

This change is Reviewable

TravisBuddy commented 4 years ago

Travis tests have failed

Hey @Jeffwan, Please read the following log in order to understand the failure reason. It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 689fb420-9894-11ea-b97e-ed27cf305a90
terrytangyuan commented 4 years ago

Thanks! This has a lot of breaking changes. Do you mind if I cut a release before merging this since I believe there are users of this operator? If so, do you have any suggested version number in mind?

Jeffwan commented 4 years ago

/hold

This is for review only.

@terrytangyuan Sure. Can you help cut a 0.7.0 release on commit https://github.com/kubeflow/mxnet-operator/commit/9f050f82308aef63fe5403b68cdd8605e99e1696? This matches with last years' Kubeflow release version.

This is a stable version last year. I will leave this year's change to future release.

TravisBuddy commented 4 years ago

Travis tests have failed

Hey @Jeffwan, Please read the following log in order to understand the failure reason. It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 5b8dfdc0-995a-11ea-a150-93ebf46199e2
TravisBuddy commented 4 years ago

Travis tests have failed

Hey @Jeffwan, Please read the following log in order to understand the failure reason. It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: db4ece30-995b-11ea-a150-93ebf46199e2
Jeffwan commented 4 years ago

coverage drops a little bit. Have to add defaults_test.go to improve coverage

Jeffwan commented 4 years ago

/cc @KingOnTheStar @suleisl2000 @wackxu

terrytangyuan commented 4 years ago

@terrytangyuan Sure. Can you help cut a 0.7.0 release on commit 9f050f8? This matches with last years' Kubeflow release version.

This is a stable version last year. I will leave this year's change to future release.

Done. I just cut a release for v0.7.0: https://github.com/kubeflow/mxnet-operator/releases/tag/v0.7.0

k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubeflow/mxnet-operator/blob/master/OWNERS)~~ [terrytangyuan] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Jeffwan commented 4 years ago

I will wait to the weekend to see anyone else has more concerns

Jeffwan commented 4 years ago

/hold cancel