voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.63k stars 548 forks source link

[FR] Add support for Kubeflow notebooks #1901

Open AlexandreBrown opened 2 years ago

AlexandreBrown commented 2 years ago

Proposal Summary

I propose to add support for the open-source ML platform called Kubeflow.
Kubeflow is a pretty popular and open platform that covers the end-to-end ML workflow.
It is a platform that provides Notebooks & can be installed in the cloud (eg: AWS, Google Cloud etc) or on premise, it runs on Kubernetes so everywhere Kubernetes run.
When we run a kubeflow notebook, we can use Jupyter Lab or other IDEAs, me & my team use Jupyter Lab. It would be great if we could launch FiftyOne App from a Kubeflow Notebook

Motivation

What areas of FiftyOne does this feature affect?

Details

The idea is to have a similar experience that is present on Google Colab but for Kubeflow Notebooks.
In short, the ideal solution should allow to launch the FiftyOne app from a Kubeflow Notebook.

Approaches brainstorm

There are a lot of different approaches we can take.

There are other approaches as well that probably exist, I am not a Kubernetes expert.

Willingness to contribute

The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

brimoor commented 2 years ago

Thanks for the feature request! @benjaminpkane is a busy guy, but he's the lead developer on the FiftyOne App, so he'd be the best point of contact on this when he has some bandwidth.

We had to make a small tweak to make the App work in Google Colab, and I suspect a similar smallish tweak would be possible to support Kubeflow notebooks.

AlexandreBrown commented 2 years ago

Hello @benjaminpkane , just circling back here, let me know what you think about this feature request and its feasibility.
Thanks

benjaminpkane commented 2 years ago

Thanks for the interest @AlexandreBrown. I will take a look this week.

josepholaide commented 2 years ago

Hi @benjaminpkane, I'd love to contribute to this. Currently going through the source codes.

dataset (None): an optional :class:`fiftyone.core.dataset.Dataset` or
            :class:`fiftyone.core.view.DatasetView` to load
        view (None): an optional :class:`fiftyone.core.view.DatasetView` to
            load
        port (None): the port number to serve the App. If None,
            ``fiftyone.config.default_app_port`` is used
        address (None): the address to serve the App. If None,
            ``fiftyone.config.default_app_address`` is used
        remote (False): whether this is a remote session, and opening the App
            should not be attempted
        desktop (None): whether to launch the App in the browser (False) or as
            a desktop App (True). If None, ``fiftyone.config.desktop_app`` is
            used. Not applicable to notebook contexts
        height (None): an optional height, in pixels, at which to render App
            instances in notebook cells. Only applicable in notebook contexts
        auto (True): whether to automatically show a new App window
            whenever the state of the session is updated. Only applicable
            in notebook contexts
        config (None): an optional :class:`fiftyone.core.config.AppConfig` to
            control fine-grained default App settings

Does this mean if I am able to expose fiftyone's port to my kubernetes cluster, the session would launch?

benjaminpkane commented 2 years ago

@josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.

josepholaide commented 2 years ago

Thank you, I will be expecting your feedback.

On Sat, 9 Jul 2022 at 00:00, Benjamin Kane @.***> wrote:

@josepholaide https://github.com/josepholaide nice! Yes, the starting point here is how networking will work. It may be more than just the port. Let me look at this tomorrow with fresh eyes, and I will provide more details.

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179417906, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP63CQ5VSM7OCUFE2X7LVTCXJRANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

benjaminpkane commented 2 years ago

Ok, pardon the delay. I spent some time trying to set up kubeflow for my own curiosity, but I'm not finished with that so I'll leave an outline of what it means to support a notebook environment in general.

Python

Notebook environments, like any environment, are controlled by sessions. In notebooks, though, the session must know what URL to display in output cells. If the environment follows the IPython API, things are fairly straightforward.

Anyway, the important function here is fiftyone.core.session/notebooks.display(). The context, e.g. IPYTHON or COLAB is checked, and a proper URL is constructed that points to the session server.

Noting your first question, remote notebooks require exposing the FiftyOne session server in addition to the Jupyter server over the network, so a Kubeflow environment will likely require extra networking as well.

App

The other important part of the equation is making sure the App knows how to call the server if there is any non-standard networking. The two important functions here are getAPI() and setFetchFunction.

One other detail is that a memory history is currently used in notebook contexts instead of a browser history, which helps avoid path issues if the notebook runs through a routed proxy, e.g. databricks. See here

That's where to get started. Full support also requires screenshots, which involves replacing cells through the IPython display handle object (no need to worry about that now). If Kubeflow uses proper Jupyter notebooks, then it shouldn't be an issue. Let me know if you have any more questions!

AlexandreBrown commented 2 years ago

@benjaminpkane Regarding the Kubeflow setup, I can help you with that.
A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/
Maybe once someone comes up with a solution/prototype, it would also be interesting to see if the solution works with production deployment where the URL won't be localhost but rather from an actual domain (eg: Using load balancer/ Cognito for AWS https://awslabs.github.io/kubeflow-manifests/docs/deployment/cognito/guide-automated/ )

Let me know if you need more help with that, I had to go through the setups many times so I'm willing to help if needed, we can always chat on slack as well.

josepholaide commented 2 years ago

Currently, working through the outlined steps @benjaminpkane

To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/

Also, I am using JupyterLab is that fine setting up fiftyone?

On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:

@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

josepholaide commented 2 years ago

@benjaminpkane following up on the Kubeflow deployment. I can assist with the kubeflow deployment setup.

On Mon, 11 Jul 2022 at 07:55, olaide joseph @.***> wrote:

Currently, working through the outlined steps @benjaminpkane

To setup Kubeflow in minutes, you can try the free trial version of Kubeflow as a service. It lasts 14 days. https://www.arrikto.com/kubeflow-as-a-service/

Also, I am using JupyterLab is that fine setting up fiftyone?

On Mon, 11 Jul 2022 at 02:40, Alexandre Brown @.***> wrote:

@benjaminpkane https://github.com/benjaminpkane Regarding the Kubeflow setup, I can help you with that, I will ping you on discord just in case you need more details. A good starting point to get up and running quickly on AWS is : https://awslabs.github.io/kubeflow-manifests/docs/deployment/vanilla/guide/

— Reply to this email directly, view it on GitHub https://github.com/voxel51/fiftyone/issues/1901#issuecomment-1179867317, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARRAP62RGDLTMZV2SCZGGTDVTN3P5ANCNFSM5ZNKYNZA . You are receiving this because you were mentioned.Message ID: @.***>

benjaminpkane commented 2 years ago

JupyterLab is fine. I'm happy to to set up Kubeflow with Arrikto if/when there is something to test