Closed DnPlas closed 1 year ago
After evaluating our engineering roadmap and priorities, the BentoML team has decided to pause the integration with Kubeflow Pipelines in the 1.8 release. We value our collaboration with Kubeflow and apologize for any disruption this causes. We believe pausing the integration temporarily is the right decision to ensure we deliver quality features. We look forward to resuming our work together in future releases. I wanted to let you know about this decision openly and transparently. Please reach out if you have any questions or concerns. As the serving work group liaison, I do not anticipate any changes in my role or responsibilities. cc @DnPlas
Hi community, as we approach our feature freeze (Aug 2nd) I think it is worth to ask about anything that you folks think will require more time to be completed before that date. The release team liaisons have been doing an excellent job at communicating with WG leads, but I extend the question to the rest of the community.
cc: @kubeflow/wg-automl-leads @kubeflow/wg-manifests-leads @kubeflow/wg-notebooks-leads @kubeflow/wg-pipeline-leads @jbottum
@tzstoyanov will commit the istio 1.18 upgrade in https://github.com/kubeflow/manifests/pull/2455 tomorrow and i hope that the other additional rootless stuff will not be relevant for the feature freeze as written down in the PR description.
@adriangonz FYI, just so you have the dates and all information about the upcoming release.
Hey @DnPlas, I'd like to sincerely ask that we have a 1 week delay. The situation with Notebooks WG is the following:
The plan I have in mind for Notebooks WG is to evaluate as soon as possible how can we unblock the CI and do the review on the PVCViewer integration. I'm expecting this to take until this Friday, 4th August.
Then on Monday we cut our v1.8-branch
in the kubeflow/kubeflow
repo and make some small PRs we need to build all the images and update the manifests. I'd expect this to take 1 day, even though the PRs are small due to the async communication.
Lastly, these are the PRs that I'd also want to finalise during the feature freeze, but am OK to not delay the release for those and cherry-pick afterwards:
ACK @kimwnasptd, I'll share this information with the release team.
@DnPlas @NohaIhab from Notebooks WG side we've:
We'll proceed the next couple of days now with cutting the release branch and updating our manifests for the RC
@kimwnasptd thanks for the update, please keep the team posted as we are planning to finish the manifest sync next week (Wednesday).
cc: @NohaIhab
@DnPlas I just created a PR to update the Kubeflow Tekton Pipelines
manifest to 2.0.0: kubeflow/kubeflow#2500
cc @Tomcli
Hi folks, I would like to announce that the Kubeflow 1.8 RC.0 is out 🎉 and that we have started with manifest testing. We expect to finish this process by the end of this week (September 15th).
I'd like to encourage community members to start testing the release and provide feedback, as well as file issues if any. I would also like to remind all Distribution owners that once Manifest Testing ends we will begin Distribution Testing on September 15th (as soon as we have the results of manifest testing). Please get your Distributions and infrastructure ready for that stage.
I also want to take the opportunity to thank all the community members who have helped with getting to this stage. Let's keep working toward a successful release!
The release team is happy to announce that we have released Kubeflow 1.8 RC.1.
This is now the time for Distributions to start with their testing. The release team kindly asks for feedback by the EOW next week. Feel free to submit issues and comment on the various WGs repositories.
I'd like to encourage community members to start testing the release and provide feedback, as well as file issues if any.
I also want to take the opportunity to thank all the community members who have helped with getting to this stage. Let's keep working toward a successful Release!
So is it so that kfp v2 will be still broken with KF 1.8? (At least #8733 is still open)
So is it so that kfp v2 will be still broken with KF 1.8? (At least #8733 is still open)
hey @chensun @zijianjoy pinging you folks for getting more accurate information. Do you think this issue deserves more attention?
Hello, I have a few issues with 1.8 based on my tests on Kubeflow 1.8 RC.1:
Cannot get MLMD objects from Metadata store.
This is resolved after the Pod for any of the Steps is created (so if I get this right, after metadata-writer
creates an entry in MLMD DB based on Running Pod).
main-logs
output artifact which makes accessing logs from the Steps after the Pod was deleted not possible.Artifacts
Page though.In general I feel that previous version of the KF Pipeline UI was more informative. For example, if the Pods' Step was Pending, it was possible to see the reason of Pending state in the Pipeline Run Page.
Is there a way to enable back the main-logs
artifact?
Tagging @Linchin for visibility, I don't know if we should create an issue in kfp, but I think kfp is working as expected ? Let me know what you think
@Davidnet from perspective of running the Steps in the Pipeline Run, I think kfp is working as expected. It's more about the dashboard, although I'm not sure if the exception Cannot get MLMD objects from Metadata store
is expected. Is the Pipeline Run dashboard maintained by the Pipelines WG or another one?
Hi @kromanow94, thank you for your feedback. I haven't completed testing rc.1 yet, but I think with my rc.0 deployment I could answer some of your questions.
If the Pipeline Run Steps are Pending, there is an error shown
Cannot get MLMD objects from Metadata store.
I have reproduced this error and I will investigate further into it.
The Pipeline Run Page doesn't show the
main-logs
This is a v1 feature that is not implemented in v2. I have created an issue in the KFP repo about this.
it's not possible to access the input and output artifacts due to RBAC Access Denied error.
I haven't been able to reproduce this on an rc.0 deployment, but I will double check on rc.1.
@kimwnasptd we need to make sure that ARM support gets merged before the final 1.8 RC:
Also, given how many issues and pending PRs are not going to make it for Kubeflow 1.8, I propose we plan to decouple versions of the kubeflow/kubeflow
repo components from the overall Kubeflow 1.X
versions.
This will allow us to cut a 1.9.0
release (of the Notebooks WG components) with some of the important fixes/features without waiting literally months. Anyone interested can discuss this proposal here:
Hi @kromanow94, thank you for your feedback. I haven't completed testing rc.1 yet, but I think with my rc.0 deployment I could answer some of your questions.
If the Pipeline Run Steps are Pending, there is an error shown
Cannot get MLMD objects from Metadata store.
I have reproduced this error and I will investigate further into it.
The Pipeline Run Page doesn't show the
main-logs
This is a v1 feature that is not implemented in v2. I have created an issue in the KFP repo about this.
it's not possible to access the input and output artifacts due to RBAC Access Denied error.
I haven't been able to reproduce this on an rc.0 deployment, but I will double check on rc.1.
hey @Linchin , thanks for your reply. Should we consider this issue as a blocker for the release? If so, should we expect an RC2 from the pipelines WG to fix this?
Regarding Notebooks, we are very close to merging the following 2 and would like to ask we wait one day tops to get those in.
@DnPlas I'd like to update kfp-tekton from 2.0.0 to 2.0.1 and here is the PR: kubeflow/kubeflow#2545 . Thanks!
Thanks for the update @kimwnasptd, @yhwang ! I'll wait for those two items to cut the next RC.
During testing, one of our users also ran into https://github.com/kubeflow/kubeflow/issues/7273. Leaving this comment for future reference as it should be fixed for the 1.8 release. cc: @kimwnasptd @NohaIhab
Hi folks, as we are starting the bug fixing phase of the release, I'd like to make a quick update on the status.
The date when we release RC2 depends directly on the readiness of the above. Could we have an update for the pending PR in kubeflow/kubeflow? @kimwnasptd @thesuperzapper
Maybe https://github.com/kubeflow/kubeflow/pull/7322 is interesting as well as bugfix.
Hey @DnPlas, the ARM PR will not be cherry-picked for this RC since the build fails on merged PRs.
For https://github.com/kubeflow/kubeflow/pull/7310 I'll try to update the PR today
Thanks @kimwnasptd, about the https://github.com/kubeflow/kubeflow/pull/7220 feature, should we expect this to be merged into the 1.8 branch at some point in the next two weeks or is it not going to be at all included in the release?
https://github.com/kubeflow/pipelines/issues/8733 is not a block for the release. We're good from Pipelines side.
Thanks for the update @chensun !
@chensun will https://github.com/kubeflow/pipelines/pull/9946 be backported for 1.8 ? Otherwise we cannot run v2 pipelines as non-root.
@chensun will kubeflow/pipelines#9946 be backported for 1.8 ? Otherwise we cannot run v2 pipelines as non-root.
I plan to cut a KFP release (2.0.2) today, and we can include that into KF 1.8
@DnPlas
KFP 2.0.2 tag is out: https://github.com/kubeflow/pipelines/releases/tag/2.0.2 Can we pull its manifests into the next RC? Thanks!
Will do, thanks @chensun !
it's not possible to access the input and output artifacts due to RBAC Access Denied error.
I have been working on setting up 1.8-rc1 as well and have observed the same things @kromanow94 mentioned above.
In addition to this, in KF <=1.7 we could also see pod ids when we clicked on a component. This was extremely helpful from a debugging and monitoring point of view. However I don't see this anymore
If this is intentional, is there some other way of getting this info? Happy to raise an issue if needed. For reference:
@sachdevayash1910 you need to be on the latest 1.8 branch. There is a bug in the KFP profile controller, where they forgot to add the serviceaccount of pipelines-ui to their rolebindings. We hacked this into the normal profile controller, but it might not be in RC1.
I am seeing RBAC: access denied
even on notebooks page when I try to open a notebook instance
notebook/{user}/test-vscode/
I deployed on on EKS 1.26 using https://github.com/kubeflow/manifests/tree/v1.8-branch with oauth2-proxy
@paravatha which oauth2-proxy manifests are you using?
The issue will probably be related to the oauth2-proxy manifests not being updated yet to support the new security features of ensuring that only the istio gateway can talk to the notebook servers (preventing in-cluster access hacking).
See here for more info: https://github.com/kubeflow/kubeflow/pull/7310
I am not sure who is responsible for maintaining the ouath2-proxy manifests, because they are not technically officially "released" yet.
@thesuperzapper just the alternate manifests in the same branch mentioned here https://github.com/kubeflow/manifests/tree/v1.8-branch#authservice and here https://github.com/kubeflow/manifests/tree/v1.8-branch#dex
hey @paravatha, we identified that issue in the previous RC. Could you please try with RC2?
Please refer to https://github.com/kubeflow/kubeflow/pull/7310 for more information on the issue.
Hi folks,
The release team is happy to announce that we have released Kubeflow 1.8 RC.2.
As we are approaching the release date on October 25th, I'd like to encourage community members to continue testing the release and provide feedback.
Hi @DnPlas I tested using https://github.com/kubeflow/manifests/tree/v1.8-branch which has 1.8 rc2 changes.
It seems to be that there are 2 places RBAC: access denied
is happening
on pipelines page (this may have been fixed in rc2, I have not seen it)
Just tested rc2, I don't see such issue.
Hey folks, I've seen those 2 issues in 1.8.0-rc.2
and we are working on them:
I need a root approver for the website to approve this one, as it required updating hugo:
Hi community,
As discussed in yesterday's community meeting, the following is expected to happen:
As always, we encourage the community to keep testing and providing feedback. Thanks everyone for your contributions!
@DnPlas there was a slight CI issue with the tagging, which prevented the tags for notebook images from being created for 1.8.0-rc.4
, lets quickly resolve it by merging https://github.com/kubeflow/kubeflow/pull/7386, and then cutting 1.8.0-rc.5
.
I've been experimenting with 1.8-rc2 and kfp=2.3.0
and have run into several issues that are blockers for me and may be for others as well. A list of them is as follows:
.after()
when referencing a component within a ParallelFor
. Issue with more details here.kfp=1.8.21
and earlier backend versions. Issue with more details here.kfp=1.8.21
and earlier backend versions. Issue with more details here.ParallelFor
to a child node within a ParallelFor
. Issue with more details here.Happy to provide more info if any of these are not clear!
I've been experimenting with 1.8-rc2 and
kfp=2.3.0
and have run into several issues that are blockers for me and may be for others as well. A list of them is as follows:1. Unable to use `.after()` when referencing a component within a `ParallelFor`. [Issue](https://github.com/kubeflow/pipelines/issues/10050) with more details here. 2. Cannot assign variables from kubernetes metadata to be environment variables as we could with `kfp=1.8.21` and earlier backend versions. [Issue](https://github.com/kubeflow/pipelines/issues/10155) with more details here. 3. Cannot assign dynamic node_selector, cpu/memory requests/limits as we could with `kfp=1.8.21` and earlier backend versions. [Issue](https://github.com/kubeflow/pipelines/issues/10154) with more details here. 4. Unable to pass data artifacts from parent node outside a `ParallelFor` to a child node within a `ParallelFor`. [Issue](https://github.com/kubeflow/pipelines/issues/10149) with more details here.
Happy to provide more info if any of these are not clear!
Hi @TristanGreathouse, thanks for the feedback and for filing those issues. It sounds like a SDK issue, but I'd suggest you try RC4 to make sure you have received the latest version of pipelines (2.0.2).
Soft ping to @chensun as he is the WG lead.
I've been experimenting with 1.8-rc2 and
kfp=2.3.0
and have run into several issues that are blockers for me and may be for others as well. A list of them is as follows:
- Unable to use
.after()
when referencing a component within aParallelFor
. Issue with more details here.- Cannot assign variables from kubernetes metadata to be environment variables as we could with
kfp=1.8.21
and earlier backend versions. Issue with more details here.- Cannot assign dynamic node_selector, cpu/memory requests/limits as we could with
kfp=1.8.21
and earlier backend versions. Issue with more details here.- Unable to pass data artifacts from parent node outside a
ParallelFor
to a child node within aParallelFor
. Issue with more details here.Happy to provide more info if any of these are not clear!
Thank you @TristanGreathouse for the detailed issues.
This issue will provide high level updates of Kubeflow 1.8 release.
TODO:
cc: @kubeflow/release-team @jbottum