Closed jlewi closed 6 years ago
+1 to move
Propose naming the repo kubeflow/tf-operator
. The rationale for basing the name on the operator rather than the CRD it controls is that you could imagine more TF-related resources in the future.
+1 to move
+1
And how about naming it kubeflow/tf-controller
:)
+1, and +1 for the name kubeflow/tf-controller
+1 to move :).
kubeflow/tf-controller
seems better for k8s's style :).
Per tensorflow/k8s#300
kubeflow/tf-operator
seems better than kubeflow/tf-controller
tf-operator SGTM, too.
+1
@jlewi what's the status of this repo (tensorflow/k8s) after moving to kubeflow/kubeflow? freezing? deleted?
@pineking This repo would be moved to kubeflow/tf-operator, not kubeflow/kubeflow, I think. It will be transferred so when you visit github.com/tensorflow/k8s it will be redirected to kubeflow/tf-operator.
I think there are some tasks we should do:
And, there will be some conflicts with most of PRs, so I think it may be better to squash the mergeable PRs ASAP.
Does it matter whether we do the rename before or after we move the repo?
I suggest we don't update the name in kubernetes/test-infra until after the move so that our tests continue to work.
@jlewi SGTM, I think it does no matter 😄
🚀 no churn no gain, early churn << later churn.
I've started working on #205 (replacing Airflow with Argo for CI). My suggestion is we wait until we've migrated off Airflow to Argo for testing to do the move. Debugging tests is so painful with Airflow that I'd rather migrate to Argo first as I think this will make dealing with the churn and breakages much easier.
I submitted #205.
I'll plan on moving this repo to kubeflow/tf-operator early this week
@wbuchwalter @DjangoPeng @gaocegege @ScorpioCPH @Jimexist @zjj2wry as committers to tensorflow/k8s can you explicitly LGTM/+1 this change?
+1 for moving it to kubeflow/tf-controller.
+1 Thanks for your work 😄
I'll plan on moving this repo to kubeflow/tf-operator early this week
There is one thing: we do not have the role committer
in our governance proposal.
wbuchwalter DjangoPeng gaocegege ScorpioCPH Jimexist zjj2wry as committers to tensorflow/k8s can you explicitly LGTM/+1 this change?
@DjangoPeng @gaocegege Seems to be some disagreement over the name.
Based on the feedback in this issue kubeflow/tf-controller
seems more popular but based on tensorflow/k8s#300 kubeflow/tf-operator
seems like the more accurate terminology.
Specifically, this comment says that an operator is a controller that is application specific; whereas controllers are not application specific.
In our case, we are very much application specific so operator seems like the better terminology (albeit its just a convention).
Technically, all operators are controllers but I think we should follow CoreOS and use operator to indicate we are application(TF) specific.
@DjangoPeng @gaocegege what do you think?
I agree with the name tf-operator
@jlewi @gaocegege
Operator is more powerful than controller in my mind. If we move it to kubeflow/tf-operator
, we'd better clearly define the scope of tf-operator
. For example, it can manage the whole lifecycle of distributed tensorflow jobs and provide many termination policies such as restart
, restore
, stop
, etc.
BTW: Considering the development of kubeflow, there are mxnet-operator
, caffe-operator
and pytorch-opperator
in the future. So, we'd better manage the corresponding CRDs in an unified repo such like kubeflow/clientset
. I'm not sure it's the time to do that or not.
@DjangoPeng
Yeah, I think this repo is an operator, since it does manage the whole lifecycle of distributed TF jobs. I agree that it is better to place all clientset and CRD in one repo, but it is not in high priority, IMO.
It seems like we're ok with kubeflow/tf-operator for now. I'll add this to the community meeting agenda just to ensure there's no final objections.
I agree with @DjangoPeng that if we add more operators in the future we might need to rethink code organization but I say we cross that bridge when we come to it.
I opened #179 to gauge interest in PyTorch. Anecdotally PyTorch is the deep learning framework that seems most popular after TF.
@jlewi The issue should be https://github.com/kubeflow/kubeflow/issues/179 😄
And FYI, there is a third party MXNet operator inspired by this repo: https://github.com/deepinsight/mxnet-operator. But it follows our old architecture, and I think we should refactor it to be event-driven or reimplement it if we want to support MXNet.
/cc @loadwiki
Cool!
Decision at the morning meeting was to move the repo to kubeflow/tf-operator I think I'd like to get an E2E test that verifies kubeflow is working (kubeflow/kubeflow#207) before we move things to try to avoid getting in the state where things are broken at head.
@gaocegege I moved the repository it is now kubeflow/tf-operator. Can you take on putting together a PR to update all the locations in the code?
@jlewi I will do it.
PTAL #386 And I think we should setup Travis and Prow for this repo.
kubernetes/test-infra#6779 defines the prow jobs for the new repo. That should hopefully get merged at which point we can figure out what's broken and needs to be fixed.
https://prow.k8s.io/?repo=kubeflow%2Ftf-operator
It looks like the prow job was triggered correctly for kubeflow//tf-operator but the results weren't written back to the PR.
The k8s-ci-robot has write access on the repo which is what we have for kubeflow/kubeflow
Looks like the prow job can't pull the image
Log not found: response has status "400 Bad Request" and body "{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \"4fea41a7-1041-11e8-b419-0a580a6c0122-0\" in pod \"4fea41a7-1041-11e8-b419-0a580a6c0122\" is waiting to start: image can't be pulled","reason":"BadRequest","code":400}
It seems that there is only one thing to do:
@gaocegege What name are you referring to?
We have already enabled it for the org https://github.com/kubernetes/test-infra/blob/master/prow/plugins.yaml#L23
So I'm going to mark this bug as closed. It would still be good to clean up the reference to tensorflow/k8s https://github.com/kubernetes/test-infra/blob/master/prow/plugins.yaml#L13
Now that we have the Kubeflow GitHub organization should we
Reasons for moving the repo
Reasons for not moving