kubeflow / manifests

A repository for Kustomize manifests
Apache License 2.0
806 stars 867 forks source link

Distributions and Kubeflow 1.6 release #2221

Closed annajung closed 1 year ago

annajung commented 2 years ago

The goal of this issue is to track the progress of distributions alongside the 1.6 release

While we hope all distros will manage to be ready when the KF 1.6 release is out, this is sometimes impossible to achieve. In this issue, we want to both keep track of the progress of distributions toward the KF 1.6 release and also which of the distros will be working on KF 1.6 (testing during the distribution testing cycle) even if they can't meet the KF 1.6 deadline.

Tagging distribution owners identified in the https://github.com/kubeflow/community/pull/560 (Any new or missed distro owners, please comment on the issue to be tracked with the 1.6 release)

Distribution Representatives State
Arrikto EKF @kimwnasptd (stretch goal) participating in 1.6
Arrikto MiniKF @kimwnasptd (stretch goal) participating in 1.6
AWS @surajkota helping with testing in 1.6
Charmed Kubeflow @DomFleischmann participating in 1.6
Google Cloud @zijianjoy @gkcalat participating in 1.6
IBM @yhwang participating in 1.6
Nutanix @johnugeorge participating in 1.6
Kubeflow with Argo CD @DavidSpek
Openshift @VaishnaviHire @LaVLaS participating in 1.6
Oracle Cloud Infrastructure @julioo participating in 1.6

Please let us know if you'll be participating in the 1.6 release by answering the following questions:

[Update on June 14th] Distribution testing is scheduled to take place from July 20th to August 10th ~Note: After the 2 weeks delay, distribution testing is now scheduled to take place from July 6th to July 27th (ref https://github.com/kubeflow/community/pull/561)~

cc @kubeflow/release-team @jbottum

yhwang commented 2 years ago

For IBM IKS,

  • Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

  • Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

johnugeorge commented 2 years ago

For Nutanix Karbon,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

surajkota commented 2 years ago

For AWS,

Are you planning on having your distro ready in sync with the KF 1.6 release?

TBD. If not in sync, we will follow up

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

DomFleischmann commented 2 years ago

For Canonical's Charmed Kubeflow

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

annajung commented 2 years ago

Hi distribution owners! After checking with all WGs, the release team has decided to extend the all release deadline by 2 more weeks.

Email announcement: https://groups.google.com/g/kubeflow-discuss/c/I4l97HvrGEA/m/227aCe_mCgAJ New schedule PR: https://github.com/kubeflow/community/pull/562

Distribution testing is now scheduled to take place from July 20th to August 10th

LaVLaS commented 2 years ago

For OpenShift,

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

zijianjoy commented 2 years ago

For Google Cloud

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

julioo commented 2 years ago

For Oracle Cloud Infrastructure

Are you planning on having your distro ready in sync with the KF 1.6 release?

Yes

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

zijianjoy commented 2 years ago

cc @gkcalat for working on Kubeflow on Google Cloud release.

kimwnasptd commented 2 years ago

A little bit late to the party, but tfr Arrikto EKF, MiniKF

Are you planning on having your distro ready in sync with the KF 1.6 release?

It will be a stretch, but this will be our goal.

Will you participate by testing your distro during the distribution testing phase and providing feedback (reporting any issues to the release team)?

Yes

kimwnasptd commented 2 years ago

Also heads up to everyone for the following items from Notebooks and Manifests WG:

  1. Status with K8s 1.22 and Notebooks https://github.com/kubeflow/manifests/issues/2199#issuecomment-1170457965
  2. We are targeting to use Istio 1.14, instead of 1.13 https://github.com/kubeflow/manifests/issues/2200#issuecomment-1170381632
  3. We are targeting on Knative 1.4 https://github.com/kubeflow/manifests/issues/2207#issuecomment-1163353597

We'll also be on the lookout during the feature freeze for any bug that could occur from any of the above updates, but we are confident there won't be any major issues. But of course don't hesitate to report and ping is you bump into anything undexpected!

annajung commented 2 years ago

Hi Distribution owners, sorry for the delay in providing you with a new RC to test with.

There was a bug identified for Notebooks WG and they're currently working on providing the release team with a new release to be used to cut a new 1.6 RC.

We hope to have the 1.6 RC1 that contains the fix for the bug identified available for you soon. Once the new RC is available, I'll leave an update here and send out an announcement to kubeflow-discuss.

If you want to get started with testing, please note the issue with Jupyter web app.

In addition, here are the PRs that would be included in the new RC.

annajung commented 2 years ago

Hi Distribution owner, providing you with another update on the RC.

As discussed in the release team meeting today (July 25th), we hope to have a new RC available for everyone early this week. We are waiting for this PR to merge as it aims to address the problem with building images using GitHub actions https://github.com/kubeflow/kubeflow/pull/6591, and once a new notebook release is available, then a PR needs to be created against the manifest repo.

The release team would like to stick with the current schedule and keep the distribution testing till August 10th as planned. However, with the delay in getting the new RC out, we also would like to gather your feedback on the current timeline and if you think it would be necessary to delay the release to increase the time for distribution testing. If you have any concerns with the current release timeline, please reach out soon to ensure your concerns are reviewed in advance before the end of distribution testing.

annajung commented 2 years ago

Kubeflow v1.6.0-rc.1 is now available!

annajung commented 2 years ago

Hi Distribution owners, friendly reminder to share any issues you ran into when testing and to update the kubeflow distribution docs

Distribution testing and Doc updates are both scheduled to end on Wed Aug 10th 2022.

johnugeorge commented 2 years ago

@annajung In the last community meeting, there was a discussion to extend by one extra week

surajkota commented 2 years ago

Testing in progress from AWS side, no new issues so far. Will post an update by early next week. @annajung when do we expect the final RC to be out? Couldnt get clear idea from Community meeting notes

https://github.com/awslabs/kubeflow-manifests/issues/309

gkcalat commented 2 years ago

Testing on GCP. We are observing profiles-deployment crashing. Could it be related to #2263? Has anyone else experienced it? Besides, we need the latest changes to contrib/metacontroller to be included in 1.6.0. They were not included in v1.6.0-rc.1. Thank you!

julioo commented 2 years ago

Testing on OCI. Will report status early next week. Inspired by https://github.com/IBM/manifests/issues/47

annajung commented 2 years ago

@annajung In the last community meeting, there was a discussion to extend by one extra week

Thanks @johnugeorge for raising this! I was not able to attend the last community meeting, but other release team members did inform me that distribution owners who were present in the meeting asked for an extension.

In addition to that, during the August 8th release team meeting, the release team discussed the following issues identified based on issues/comments mentioned in the distribution tracking and release tracking

Based on the extension request and a need for a new RC, we are working on a new release timeline to provide to the community. We contacted the Notebook WG to determine if the issues identified are release blocking issues and if they will be providing another RC for the release.

Unless there are release blocking issues, we'll stick to the date that was agreed on during the community meeting last week which is August 17th for distribution testing to end.

I plan to send out an official announcement to kubeflow-discuss about the new timeline after catching up with the notebook WG or before the 10th whichever comes first.

annajung commented 2 years ago

Hi everyone, I owe an update here - will be sending out a message on kubeflow-discuss today as well.

After catching up with notebook WG and investigating the three issues identified, here is where we are now.

  1. Missing Notebook image group
    • This has been identified as a release-blocking issue. There is a PR open that might fix the issue. However, even if we get this merged, the lead who can provide the release team with a new Notebook RC is not available until the week of Aug 22nd.
  2. Duplicate liveness probe in Notebook controller manager
    • After investigating this further, it looks like this is a non-issue for those using kustomize 3.x as stated in the kubeflow manifest installation README. For those using kustomize 4.x, there is a PR open to fix: https://github.com/kubeflow/kubeflow/pull/6604. Since Kubeflow Manifest does not support kustomize 4.x, this has been labeled as non-release-blocking issue.
  3. Metacontroller update not included in the RC 1
    • This PR adds the metacontroller into the /contrib directory which does not get used by default in the pipeline installation. By default, the metacontroller from /third-party is used and it already has the update that was made to the /contrib. This means that the current RC already contains this change, therefore, there are no changes to any functionality. I reached out to the pipeline team to get their feedback. Until we hear otherwise, the release team has labeled this as non-blocking-issue and has no plans to cut a new RC for this change.

Overall,

Thanks everyone!

cc @kubeflow/release-team

annajung commented 2 years ago

The official announcement for the release delay has been sent to kubeflow-discuss mailing list. The proposed timeline PR is also available if distribution owners would like to provide your feedback.

The proposed timeline moves the distribution testing end date to August 31st

DnPlas commented 2 years ago

Hi folks, here's a list of issues we have run into (charmed kubeflow).

We also expect kubeflow/manifests#2150 to be merged soon, either for the next RC or patch release.

ryansteakley commented 2 years ago

Hello, here is an issue the AWS team has found on v1.6.0-rc.1. Currently we consider this a release blocker as this is feature-regression. We are currently looking into it, any help from the community would be welcome and appreciated.

julioo commented 2 years ago

Hello, I successfully installed KF v1.6.0-rc.1 on Oracle Infrastructure (OKE 1.22.5, 1.23.4 and 1.24.1).

I am waiting to test the next RC and to share the Github page with OCI documentation.

surajkota commented 2 years ago

Created https://github.com/kubeflow/kubeflow/pull/6624 to address https://github.com/kubeflow/kubeflow/issues/6618

julioo commented 2 years ago
  • One problem with Mnist E2E Vanilla demo but related to ipykernel/iostream.py version will create an issue to document. Created kubeflow/examples/issues/993 to document the issue
surajkota commented 2 years ago

@kimwnasptd @yuzisun It would be great to consider this PR https://github.com/kubeflow/kubeflow/pull/6627 for this release. Details in the issue

annajung commented 2 years ago

Hi distribution owners, new notebook RC with the fix for the image group issue is planned to be cut by upcoming Tuesday.

The new RC might include more than the fix for the image group fix. It may include the fix for the profiler issue https://github.com/kubeflow/kubeflow/issues/6618 as well, hope notebook wg lead @kimwnasptd can share more.

As for other issues that were raised, none of them have been labeled as blocker issues from the WG leads so far. Therefore, not being tracked as release blocking issue for this release.

Please don't forget to keep your distribution docs updated by making updates to the following docs before the docs deadline EOD Aug 31st.

VaishnaviHire commented 2 years ago

Hello, successfully installed KF v1.6.0-rc.1 on OCP 4.9. The ongoing testing issue can be tracked here - https://github.com/opendatahub-io/manifests/issues/99.

johnugeorge commented 2 years ago

1.6.rc1 is tested against Nutanix NKE 2.5.0 with k8s 1.22 Ref: https://github.com/nutanix/karbon-platform-services/issues/94

annajung commented 2 years ago

Hello distribution owners,

For those who missed the Aug 30th community meeting, please check out the discussion in the meeting notes / recording.

TL;DR

More details can be found in the announcement: https://groups.google.com/g/kubeflow-discuss/c/TNwRfoq3Pk4/m/r5aIGS2XBAAJ

annajung commented 2 years ago

Hi Distribution owners,

Kubeflow RC.2 is now available!

Announcement: https://groups.google.com/g/kubeflow-discuss/c/S79XhJYIkC8/m/GBvvWasoBQAJ

gkcalat commented 2 years ago

@annajung deployment of the rc.2 on GCP fails because of https://github.com/kubeflow/kubeflow/pull/6604.

annajung commented 2 years ago

Hey @gkcalat, manifest WG only supports Kustomize 3.2.0 and any other version especially Kustomize 4.x is not supported. My understanding is that they are waiting for https://github.com/kubeflow/manifests/issues/1797#issuecomment-861491762 to be resolved before being able to support Kustomize 4.x

https://github.com/kubeflow/kubeflow/pull/6604 is a bug but it's a non-issue for those using Kustomize 3.2.0 and was not given priority or labeled as release-blocking due to those reasons.

tagging Manifest WG to chime in more if needed @kubeflow/wg-manifests-leads @kimwnasptd

gkcalat commented 2 years ago

Hey @gkcalat, manifest WG only supports Kustomize 3.2.0 and any other version especially Kustomize 4.x is not supported. My understanding is that they are waiting for https://github.com/kubeflow/manifests/issues/1797#issuecomment-861491762 to be resolved before being able to support Kustomize 4.x

https://github.com/kubeflow/kubeflow/pull/6604 is a bug but it's a non-issue for those using Kustomize 3.2.0 and was not given priority or labeled as release-blocking due to those reasons.

tagging Manifest WG to chime in more if needed @kubeflow/wg-manifests-leads @kimwnasptd

Thank you.

Are we going to leave users who used newer kustomize versions outside the boat? Are we going to ask users who have KF 1.5 to downgrade kustomize?

That comment on kustomize 4.x support is a year old. The issue I mentioned is the only blocker that was introduced in a recent PR. Why don't we resolve that small bug (literally a few lines in a single file) and move forward with the release?

annajung commented 2 years ago

Hi everyone, it looks like Notebook WG was already looking at the PR and have finished testing the issue. Therefore, they were available to provide a new Notebook RC.3 that includes https://github.com/kubeflow/kubeflow/pull/6604.

With that, we create a new kubeflow RC.3. It updates notebook version to RC3 which only includes one fix https://github.com/kubeflow/manifests/releases/tag/v1.6.0-rc.3

Thanks @kimwnasptd for your help today!

yhwang commented 2 years ago

finished RC.3 testing on IKS distribution testing, all good! Actually, I tested against RC.2 and the deployment yaml file generated from kustomize for RC.2 and RC.3 are identical. So consider that I finished RC.3 testing!

Thanks to the release team's effort and making the RC.3 being available.

gkcalat commented 2 years ago

Looking good on GCP. Kudos to @annajung and @kimwnasptd!

juliusvonkohout commented 2 years ago

Please beware of two severe regressions in 1.6. Hopefully they will be fixed in 1.6.1.

  1. https://github.com/kubeflow/pipelines/issues/8256
  2. https://github.com/kubeflow/kubeflow/issues/6648
annajung commented 1 year ago

Hi everyone, 1.6.1-rc.0 is now available which includes fixes for

kubeflow-discuss announcement: https://groups.google.com/g/kubeflow-discuss/c/-0fhFVnj1j8/m/XFVGAzxjBAAJ

annajung commented 1 year ago

Hi everyone, @surajkota brought up a really good point about pipelines alpha.5, therefore, I want to give the community a few more days to test against Kubeflow 1.6.1-rc.0 before cutting the final release.

The release team previously mentioned that we'll have a final 1.6.1 cut available today Oct 3rd, but we will delay cutting the final release until this next Monday, October 10th.

Please help test against the 1.6.1.-rc.0 and bring up any issues identified in the 1.6 tracking issue

annajung commented 1 year ago

Hi everyone, with no other issues identified, we went ahead and cut the final 1.6.1 release!

If using 1.6.1, please note that you also need to update https://www.kubeflow.org/docs/started/installing-kubeflow/

More info about the 1.6.1 release can be found in kubeflow-discuss announcement: https://groups.google.com/g/kubeflow-discuss/c/amsxyXbY_nk/m/FaWxOd4VBAAJ

annajung commented 1 year ago

With 1.6.1 released and 1.7 release started, closing out the issue.

FYI, Call for distribution participating for 1.7 is also available.