kubeflow / examples

A repository to host extended examples and tutorials
Apache License 2.0
1.41k stars 756 forks source link

Katacoda scenario on github summarization example; friction log #89

Closed lluunn closed 5 years ago

lluunn commented 6 years ago

Running through the scenario

step 1: no issue

step2:

step3:

step4:

cc @jlewi @ankushagarwal

BenHall commented 6 years ago

Thanks for the feedback.

Step2: GCP: Thanks, we'll get the format updated however, we're reluctant to provide a live read/write access token to a storage bucket as it could be open for abuse. Ideally the example would support Kubernetes persistent volumes so users can decided where the data comes from. This would allow us to train a smaller subset to demonstrate the workflow.

Updated the path, kubelog, typo and path. I don't think the repo existed when we created this so thank you for highlighting.

Step 3: In terms of performance, we'll pre-cache the images once we are happy with which version to use. This will speed up the pod creation as at the moment the images need to be pulled.

Step 4: Thanks! We used kubectl patch to workaround this but pleased it's been added as a ksonnet parameter.

jlewi commented 6 years ago

Thanks Ben.

It would probably also be a good idea on the Katacode side specify a commit when cloning GitHub example so it doesn't break unexpectedly. e.g. #84 just moved the ksonnet app.

There are a couple issues on our end that I think need to be fixed

90 Need to check in vendor directory.

Filed #91 to eliminate the use of GCS.

BenHall commented 6 years ago

@jlewi I'm just building a new environment. Which image should I be using for the training and serving? I see the following in the repo:

github-issue-summarization-serving-demo |  
issue-summarization |  
issue-summarization-model |  
issue-summarization-seldon |  
issue-summarization-ui |  
issue-summarization-ui-test

My guess was issue-summarization-seldon for serving but the creation date is March 12, 2018.

The tf-job-issue-summarization image looks to have been deleted?

jlewi commented 6 years ago

@BenHall What is image in this context? Docker image or ksonnet component?

@texasmichelle can probably answer better than me what the right component is.

BenHall commented 6 years ago

Sorry, these are the Docker Images

On Wed, 25 Apr 2018, 3:34 am Jeremy Lewi, notifications@github.com wrote:

@BenHall https://github.com/BenHall What is image in this context? Docker image or ksonnet component?

@texasmichelle https://github.com/texasmichelle can probably answer better than me what the right component is.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/examples/issues/89#issuecomment-384142342, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFCtq5zwYtf7dIDNzPbEjGAaCSLg6YXks5tr-CZgaJpZM4Tg3j_ .

jlewi commented 6 years ago

I think you will want the following docker images

Can you explain how the docker images fit into your environment? Is this just to cache them for faster downloads? It looks like in step 2 you tell them to clone our example repository. So I'd expect the images used would be determined by the code in the example.

I just noticed that you are forking kubeflow/kubeflow and have users point to katacoda/kubeflow? Can you explain why you are doing this.

Is the scenario somewhere in GitHub that I can look at. I'd like to understand how this all works to see if there's anything we can do to make it easier to maintain.

For reference I'm basing my answer on what is running in our dev instance.

/cc @texasmichelle

BenHall commented 6 years ago

We're seeing the liveness probe failing https://github.com/kubeflow/examples/issues/106

BenHall commented 6 years ago

I just noticed that you are forking kubeflow/kubeflow and have users point to katacoda/kubeflow? Can you explain why you are doing this.

To ensure that we're working against a known release with known versions of the Docker Images. We've also stripped down the repository to make it faster to clone.

All the scenarios content is at https://github.com/katacoda-scenarios/kubeflow-scenarios. We're more than happy for this to be moved to https://github.com/kubeflow/katacoda-scenarios so the community can create additional content :)

jlewi commented 6 years ago

Those are good reasons. My intent was only to learn how to better support Katacoda. I don't have a strong oppinion about where the scenarios should live.

jlewi commented 6 years ago

@BenHall The changes to support running on PVC are committed.I haven't had a chance yet to go through the complete example E2E yet to make sure it works yet.

/cc @texasmichelle

BenHall commented 6 years ago

Thanks @jlewi ! This looks to fix serving, is the plan to do the same for the training too?

jlewi commented 6 years ago

@BenHall Training is already done via #98.

So here's roughly how it should go

Training:

  1. Create a PVC to store the data
ks apply ${ENV} -c data-pvc
  1. Download the data to the PVC via K8s job
ks apply ${ENV} -c data-downloader
  1. Submit training job using the PVC
ks apply ${ENV} -c tfjob-pvc

Serving

  1. Deploy the model
ks apply ${ENV} -c issue-summarization-model-serving
  1. (Optional) Deploy the webapp
ks apply ${ENV} -c ui
stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.