Closed PatrickXYS closed 2 years ago
Training operator uses the ECR repo for last 3 releases ( v1.4.0, v1.3.0, v1.21, v1.1) https://github.com/kubeflow/training-operator/blob/master/manifests/overlays/kubeflow/kustomization.yaml#L9
We should not deprecate the ECR repos for upcoming few releases. Though we will find an alternative solution starting from this release, we have to allow current users to continue using the release manifests.
@PatrickXYS we understand the need to deprecate this repositories and we request you to work with us as we figure out the next gen of this infrastructure. Please see details in https://github.com/kubeflow/testing/issues/1006#issuecomment-1153299757.
In the meantime, we will work on funding the account with the needed credits to use these repositories.
@PatrickXYS are you still available on Kubeflow slack or Is xyshowww@gmail.com
the correct email to reach you?
@PatrickXYS can please you clarify if we can maintain the old ECR public images (even if we don't add new images) so that people's old manifest don't break?
I don't think any kubeflow repos are actively using those ECR images for deployment except notebooks and training WG.
Reason for the above observation: only notebooks and training ECR image has version tags such as v1.1
, v1.2
, etc.
So the rest of the other repos are no-concern, but notebooks and training might be special ones.
Due to my current situation, I have very limited bandwidth and am slow to respond.
I have mirrored all the public.ecr.aws/j1r0q0g6/{IMAGE_NAME}
ones to GitHub Container Registry under ghcr.io/kubeflow/{IMAGE_NAME}
, for example:
ghcr.io/kubeflow/kubeflow/access-management
ghcr.io/kubeflow/kubeflow/jupyter-web-app
ghcr.io/kubeflow/training/tf-operator
These are just a one-time backup of the historical tags, but the working groups can feel free to start using these images for new manifests/releases if they like (GitHub actions within the corresponding kubeflow GitHub repo can push to these registries).
@surajkota clarified that we are still planning to keep the old ECR public.ecr.aws/j1r0q0g6/{IMAGE_NAME}
image available by migrating the ECR to be owned by a new AWS account.
I can share the python script I used to migrate the ECR images to GHCR (as this migration can be a bit of a nightmare due to ECR lacking the ability to list tags).
I have raised https://github.com/kubeflow/community/issues/782 about using ghcr.io/kubeflow/{IMAGE_NAME}
as the default image registry for Kubeflow images (in addition to a DockerHub / ECR mirror).
WGs have confirmed that private ECR registries were only used for testing purpose and have not been used in release manifests. Hence, the private ECR registries can be deprecated.
As @thesuperzapper commented, I am looking into migrating the public ECR repositories to a new account(with same registry alias) and will post an update here as soon as I have confirmation on the process. Repositories which need to be migrated:
public.ecr.aws/j1r0q0g6/notebooks
public.ecr.aws/j1r0q0g6/training
Hi @PatrickXYS, thanks for your patience. We have made progress on securing the credits for new AWS account (https://github.com/kubeflow/testing/issues/1006#issuecomment-1184057660) which unblocks us to move with next steps w.r.t these ECR repositories migration.
Update on migrating the existing repository to a new account: The registry alias(j1r0q0g6
) is a unique string/key and because of this, although there is a way to migrate the repositories to a new account, it involves downtime(a few minutes). (Since there can be only one registry linked to this alias)
So instead of the above approach for migrating ECR repository, I am proposing another option of migrating the AWS account(809251082950
) under the new AWS organization but with the following conditions:
public.ecr.aws/j1r0q0g6/notebooks
public.ecr.aws/j1r0q0g6/training
For this to happen, we need a shared email address between Notebooks and Training WG and we can look into other options for decoupling the training and notebook repositories later if needed.
This is the fastest and cleanest way to mitigate this and we do not need any help from the AWS ECR team.
@kimwnasptd and @johnugeorge Please let me know your thoughts on this approach.
@kimwnasptd Given these complications, Should we just just create and track the mapping previous image location-> new image location for earlier releases? (without any extra changes) Release 1.4 or earlier doesn't support k8s 1.21+ anyways. There is very less chance of newer installations. In any case, if users run into issues with images, they can override the image name/tag in kustomization file and move forward.
@johnugeorge To clarify, the only action required from one of you(anyone from Notebook or training WG) is to create an email address. The repository will stay in the same account as before and there is no impact on user experience. Please help us understand if this looks complicated to you?
Re: 1.5 or earlier & as per your comment on June 13, we need to consider the users who have not migrated to newer versions of K8s or Kubeflow. It's not only about new installations, the existing installation will also break. For e.g. notebook server images will become unavailable suddenly. I disagree with your proposal to just document it given the effort is to only create an email address and this account will be part of AWS Organization like the other WG accounts. It's similar to how you might have created a dockerhub account for publishing 1.6 images.
If it is not, would you be ok with just freezing that card and getting a new card?
Let's try to avoid this, please try to get a credit card from your team, and replace my card with yours with help from @andreyvelich . After that, feel free to take any action. I believe my proposal is a better way to move forward.
Hi everyone, the credits applied in the account 809251082950
were about to expire end of this week on 07/31. Given that we cannot deprecate the repositories without a proper plan, timeline etc. and do not want to break any customer deployments. We have completed the steps outlined in the comment above to ensure Yao's credit card does not get charged for the billing.
This account is now part of the new AWS organization. Yao, I have opened a case with AWS support to remove your credit card since its no longer required. Will keep you updated.
Great, just saw the message. Please keep me updated with the credit card removal stuff since I can't access the AWS account now, which makes me concerned if AWS will be applying bills to my credit card.
@surajkota Could you please help keep me updated with the card removal process, there are still possibilities that billing will be applied to my card but I have no access to the AWS account.
As I mentioned in the previous comment:
Let's try to avoid this, please try to get a credit card from your team, and replace my card with yours with help from @andreyvelich . After that, feel free to take any action. I believe my proposal is a better way to move forward.
Replace the card before taking any action, but seems like it didn't work on the AWS side, could you please expedite the process to resolve my concern?
Hi @PatrickXYS, your credit card has been removed from the account. Thanks for your patience
Info for historical purpose: since this was a standalone account before, it is not possible to remove the default payment information. We added the same card as on the management account with help of @kimwnasptd, Amber Graner.
We can close this issue now. /close
@surajkota: Closing this issue.
As a post item of deprecating optional-test-infra, we'll deprecate all the ECR repos provided by optional-test-infra.
Private ECR Repo List
kubeflow/katib:
809251082950.dkr.ecr.us-west-2.amazonaws.com/katib/v1beta1/
cert-generator
earlystopping-medianstop
file-metrics-collector
katib-db-manager
suggestion-chocolate
suggestion-darts
suggestion-enas
suggestion-goptuna
suggestion-hyperband
suggestion-hyperopt
suggestion-optuna
suggestion-skopt
tfevent-metrics-collector
trial-darts-cnn-cifar10
trial-enas-cnn-cifar10-cpu
trial-enas-cnn-cifar10-gpu
trial-mxnet-mnist
trial-pytorch-mnist
trial-tf-mnist-with-summaries
kserve/kserve:
809251082950.dkr.ecr.us-west-2.amazonaws.com/kserve/
agent
aix-explainer
alibi-explainer
art-explainer
batcher
image-transformer
kserve-controller
lgbserver
paddleserver
pmmlserver
pytorchserver
sklearnserver
storage-initializer
xgbserver
kubeflow/training related:
809251082950.dkr.ecr.us-west-2.amazonaws.com/kserve/
pytorch-operator
tf-operator
training-operator
Public ECR Repo List
kubeflow/kubeflow:
public.ecr.aws/j1r0q0g6/notebooks
access-management
admission-webhook
central-dashboard
jupyter-web-app
notebook-controller
notebook-servers/base
notebook-servers/codeserver
notebook-servers/jupyter
notebook-servers/jupyter-cuda
notebook-servers/jupyter-pytorch-cuda
notebook-servers/jupyter-pytorch-cuda-full
notebook-servers/jupyter-pytorch-full
notebook-servers/jupyter-scipy
notebook-servers/jupyter-tensorflow
notebook-servers/jupyter-tensorflow-cuda
notebook-servers/jupyter-tensorflow-cuda-full
notebook-servers/jupyter-tensorflow-full
notebook-server/rstudio
notebook-server/rstudio-tidyverse
profile-controller
tensorboard-controller
tensorboards-web-app
volumes-web-app
kubeflow/training related:
public.ecr.aws/j1r0q0g6/training
tf-operator
training-operator