kubeflow / community

Information about the Kubeflow community including proposals and governance information.
Apache License 2.0
160 stars 220 forks source link

Kale donation to Kubeflow #730

Open StefanoFioravanzo opened 4 months ago

StefanoFioravanzo commented 4 months ago

Hello Kubeflow community! Do you remember Kale? The low-code/no-code Jupyter extension that simplifies the Data Science experience of orchestrating ML experiments from notebooks. This includes defining and deploying Kubeflow pipelines and triggering Katib hyper parameter tuning experiments with just a few clicks and code cell annotations.

If you are not familiar with it, you can read the introductory blog post or watch this video.

KaleGIF-conv

The project has been stale for some time and some of you may remember that Arrikto was supposed to upstream a more advanced version. Unfortunately those plans didn't pan out, Arrikto was later acquired, and the project is now locked down.

I do have administrative access to the original repository github.com/kubeflow-kale/kale and I would love to donate it to the Kubeflow community to bring it back to life.

A high-level low-code interface that quickly gets you from notebook experimentation to pipelines and other distributed training facilities is a true upselling for Kubeflow. I personally saw how appealing this approach was to end users, and several other enterprise distributions beyond Arrikto capitalized on this capability.

FUTURE WORK

I believe this approach is still relevant today and it can give Kubeflow a great value added. Jupyter notebooks are still the centerpiece for countless ML practitioners, and Kale can be Kubeflow's playground to provide high-level coding user experience.

I can definitely envision how we could evolve the current annotation-based progrmming model to better integrate with other Kubeflow components and explore ways to expose state-of-the-art capabilities such as GPT-based workflows, in-app tutorials, and integrations with upcoming Kubeflow components (e.g. model registry, spark, etc.)

WHAT'S NEEDED

The codebase is a little bit old, so I would need help with the following:

CALL TO ACTION

USERS: Were you (or still are!) a Kale user and want this project to be maintained by the Kubeflow community? Are you an interested user who would love a low-code solution for Kubeflow? Then please react with 👍

MAINTAINERS: Are you a code maintainer, documentation specialist, architect, or literally anything that can be helpful to the development of the project? Then please react with 🚀 and comment how you can help!

Your feedback and reactions to this issue are instrumental to validate that indeed there is still interest and to set the right direction for this project. Please comment with your opinions and ideas.

With ❤️ and 🥬 Thank you all!

elikatsis commented 4 months ago

@StefanoFioravanzo this is amazing news! I recalled a ton of great memories around Kale and our collaboration while reading this.

I'll definitely be watching Kale closely and will try to contribute on the development front.

Good luck!

milosjava commented 4 months ago

@StefanoFioravanzo I believe this project can be very useful for data scientists that are familiar with jupyter and want to use KF. I am very interested in participating.

rochaporto commented 4 months ago

Kale was one of the most popular features for new users in our internal kubeflow platform. We had issues with the maintenance of the component in particular during kubeflow upgrades, but would love to see this component properly integrated in the kubeflow ecosystem! We would definitely consider adding it back to our system.

thesuperzapper commented 4 months ago

@StefanoFioravanzo This is certainly an interesting idea.

In my mind there are a few requirements that we need to meet before accepting such a donation:

  1. Ensure that Kale is still relevant to users, and that it will provide value.
  2. Find at least two maintainers (preferably from different organizations), who will be responsible for maintaining it.
  3. Decide which Kubeflow Working Group (WG) will own Kale, I can see arguments for any of the following:
    • Kubeflow Notebooks WG (@kubeflow/wg-notebooks-leads) - as they control the Jupyter deployments of Kubeflow, and Kale is a Jupyter plugin
    • Kubeflow Pipelines WG (@kubeflow/pipelines) - as Kale is ostensibly an extension to create Kubeflow Pipelines with a UI
    • Create a new WG - this is always a challenge, but if there are lots of willing maintainers for Kale, it might make sense.

I also want to note that there is a similar project to Kale called Elyra (elyra-ai/elyra) which provides a UI-like experience for Kubeflow Pipelines and Airflow, and is also owned by the Linux Foundation, under LF AI & Data. However, Elyra is also largely unmaintained (last release in March 2023).

Elyra was spearheaded by IBM, so I wonder if RedHat would be interested in picking up the torch on it (given they are now owned by IBM and have taken an interest in Kubeflow).

Perhaps we could merge Elyra and Kale into a single project under Kubeflow?

ederign commented 4 months ago

@thesuperzapper, we still use Elyra at ODH, but as you mentioned, the project is mostly unmaintained, and we are discussing our options forward. Certantly, Kale will be one of the options that we are going to consider. I'm checking internally, and I'll send an update here as soon as possible.

chasecadet commented 4 months ago

@StefanoFioravanzo this is FANTASTIC. I know at Arrikto Kale was THE demo and I believe not only HPE Ezmeral, but Canonicals distribution uses Kale in some capacity. As we start to advocate for our users, I think bringing out low code or UX focused tools will really accelerate Kubeflow adoption. Even having that as an initial touchpoint into the system and a demo at the Kubeflow booth at conferences would be a huge catalyst. The first step might be to broadcast this to users and gauge excitement around it as well as figure out next steps. How can we gather data on this? Shooting from the hip here, you could make a blog post talking about reviving Kale and have a survey around it. Evil Tux would broadcast it and we could use the official Kubeflow blog too. Thoughts?

ajinkya933 commented 4 months ago

I would love Kale to be part of this donation.

aronchick commented 4 months ago

I am a HUGE fan of Kale. I created something similar - which I would be happy to "donate" as well, if people want! https://sameproject.ml/

This approach is critical - I'm really excited!

saemaromoon commented 4 months ago

I've been working with customized Kale for my data science team for a couple of years now, so I'm excited to hear that the project is gaining traction again! I'd love to contribute to the project and explore ways I can help.

terrytangyuan commented 4 months ago

@StefanoFioravanzo Could you check with Arrikto/HPE's legal team on this? Even though you have admin access to the project, the project was developed and sponsored by Arrikto and it's listed under the AUTHORS file in Kale's repo so it would be good to get a formal approval.

andreeamun commented 4 months ago

@StefanoFioravanzo , this is great news! Kale got great traction and user adoption in its early days, so probably once the administrative activities are sorted, revamping it and enabling users to use it will be needed.

Also, since security is one of the project's priorities, I would investigate if we could base the base-images on a maintained OS (Ubuntu maybe based on the upstream discussions) and activate the scanners for CVEs. As you already mentioned, documentation is also a crucial part in order to ensure the success of the component, especially since it lowers the barrier entry for Kubeflow users.

andreyvelich commented 4 months ago

@StefanoFioravanzo Thank you for driving this, this is great!

We can see that JupyterLab extensions are evolving very rapidly, e.g:

As a result, JupyterLab is becoming the primary place (e.g. virtual IDE) for Data Scientists to do end-to-end ML at scale.
cc @Zsailer @bigsur0 @lresende

I would love to see more and more Kubeflow components being natively integrated into JupyterLab as extensions which Kale can help with. As other mentioned, it gives a lot of value for Data Scientists to make it easier to interact with Kubeflow interfaces from the Notebook and JupyterLab.

@StefanoFioravanzo As a first step, I think it would be nice if you could present the Kale capabilities in one of the Kubeflow Community calls to show its value.

yhwang commented 4 months ago

@StefanoFioravanzo Here from IBM, this would be a great feature to integrate with Kubeflow Pipelines. No code/Low code pipeline editing could significantly expedite the pipeline composing/development. Appreciate that you're driving this.

omarsumadi commented 4 months ago

@StefanoFioravanzo Here from Capital One. We had explored using Kale in the past and have considered using Elyra and would like to see the continuation of Kale and other Notebooks-based UX improvements in Kubeflow be re-introduced.

ederign commented 4 months ago

I believe it is a general understanding that our community needs a low-code/no-code Jupyter (IDE) extension that simplifies the Data Science experience of orchestrating ML experiments from notebooks.

In this thread, people discussed a few alternatives:

My suggestion for deciding which solution to adopt is to get concrete data to assess the current state of those alternatives properly.

I'm currently working on the proposal/mapping for Kubeflow UI architecture, but I can start this research as my next task.

What do you think?

mshaikh786 commented 4 months ago

We at KAUST would really benefit from this addition to the Kubeflow ecosystem

StefanoFioravanzo commented 3 months ago

Thank you so much for the supporting words and commitment to bring this project back. The response is overwhelming, and I cannot wait to set things in motion.

I hear those of you comparing Kale to other projects. I think this is an opportunity for the Kubeflow community to define a broader strategy for the development of JupyterLab-based Data Science centric IDE tools and plugins, using Kale as the starting point. My next action item is to distill these comments and opinions into a doc so that we can all review and agree on a way forward.

The next few weeks may be a little bit slow for me, I'll be on holiday for a couple of weeks in August. I'll be back soon with updates.

Thank you again to everyone who commented and reacted!

omarsumadi commented 1 month ago

@StefanoFioravanzo Do you want to restart this through a Product Management route by sending out some surveys about Kale?

I have some engineers that are finally ready from Capital One to contribute to an initiative on getting Kale back into Kubeflow. I'm happy to take a start on a document to synthesize the opinions, generate survey questions, and have an engineer do some spike work on getting Kale back into the repo structure here on KF.

Thanks, Omar

StefanoFioravanzo commented 1 month ago

@omarsumadi That's great to hear! Your timing is perfect. I took a break from anything related to technology for a few weeks and now I was just getting back to this :)

Do you want to restart this through a Product Management route by sending out some surveys about Kale?

I definitely want to take a considered approach to defining a strategy that benefits both the community and the users. I want to explore surveys and other user research approaches, although I feel like the very first step is to donate the code to the community and bring it back online. There is indeed engineering work to be done to update the code's dependencies and align with the most recent KFP version.

There are people who have intimate knowledge about the current codebase that can help you guys get started. I believe we can form a small tactical team with the only goal of updating the code.