kubeflow / examples

A repository to host extended examples and tutorials
Apache License 2.0
1.41k stars 756 forks source link

[GH Issue Summarization] Deploy on dev.kubeflow.org #62

Closed jlewi closed 6 years ago

jlewi commented 6 years ago

We should deploy the webserver and model on our dev instance of Kubeflow (dev.kubeflow.org) and provide a public URL for accessing the app.

jlewi commented 6 years ago

/assign @ankushagarwal

ankushagarwal commented 6 years ago

It is deployed to https://dev.kubeflow.org/issue-summarization/

Enter issue body in the textbox and get a machine generated summary

jlewi commented 6 years ago

This is pretty great.

For the couple of examples I tried the summary was pretty inaccurate. How was this model trained? Should we train on more examples?

Some other ideas

/cc @hamelsmu

hamelsmu commented 6 years ago

If I’m following the code correctly, I think you are training on a very small sample. I would recommend training on the full dataset instead for this to work.

On Thu, Apr 5, 2018 at 10:59 AM Jeremy Lewi notifications@github.com wrote:

This is pretty great.

For the couple of examples I tried the summary was pretty inaccurate. How was this model trained? Should we train on more examples?

Some other ideas

  • Would be nice if users could just enter a link to an issue and the web app could fetch it automatically

/cc @hamelsmu https://github.com/hamelsmu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubeflow/examples/issues/62#issuecomment-379023829, or mute the thread https://github.com/notifications/unsubscribe-auth/ABakko-zi8IaBsY87PRpy2vBK4UyL3K2ks5tlltsgaJpZM4S_Fnl .

ankushagarwal commented 6 years ago

I trained on the entire dataset. I can try training for a few more epochs.

ankushagarwal commented 6 years ago

@hamelsmu : Do you have a trained model .h5 file? I could use that for our demo.

jlewi commented 6 years ago

@ankushagarwal Can we create a public endpoint as well?

jlewi commented 6 years ago

@ankushagarwal Can you launch tensorboard for the model so we can see what the metrics are?

hamelsmu commented 6 years ago

@ankushagarwal you will need three files to instantiate the full suite you need for inference:

  1. text transformer for bodies (to feed data into the encoder):
  2. text transformer for titles (contains metadata you need to go from int -> text):
  3. the model file (.h5)

I am generating these artifacts for you now (by training the model again from scratch) and will post with a new comment with a link to all three components.

hamelsmu commented 6 years ago

@ankushagarwal can you whitelist me and my colleagues onto https://dev.kubeflow.org/issue-summarization/ . ?

jlewi commented 6 years ago

@hamelsmu I whitelisted the folks listed above.

hamelsmu commented 6 years ago

thanks @jlewi and @ankushagarwal !! I am pretty excited about this, and my team will be, too. Let me know when the public endpoint is available. This is super cool!

@ankushagarwal I tried 10 random issues it seemed okay to me, but I went ahead and re-ran the model just incase and sharing the files with you as promised. Can you share the specific issues that are not being summarized very well?

Here are the files as promised, just incase:

  1. body transformer: https://storage.googleapis.com/hamel_githubissues/body_pp.dpkl
  2. title transformer: https://storage.googleapis.com/hamel_githubissues/title_pp.dpkl
  3. model: https://storage.googleapis.com/hamel_githubissues/seq2seq_model_tutorial.h5

I trained this model on 2 Million issues, which was sampled from this dataset: https://storage.googleapis.com/hamel_githubissues/github-issues.zip

Thanks so much for doing this! Its really cool! Also, tagging @dansbecker as I'm collaborating with him on this same thing for kaggle-learn.

ankushagarwal commented 6 years ago

I have polished the UI and added features to populate random issues automatically for testing. These issues are a random sampling of https://storage.googleapis.com/hamel_githubissues/github-issues.zip

@hamelsmu : Thanks for training again, I will update the deployed model with these.

I'll also try to create a public url for this by EOD.

ankushagarwal commented 6 years ago

Created a public url : http://35.190.4.92/

jlewi commented 6 years ago

I created the DNS record http://gh-demo.kubeflow.org/

hamelsmu commented 6 years ago

Is it ok for me to tweet about this?

jlewi commented 6 years ago

We would love that!

@aronchick can retweet it from the Kubeflow account.

jlewi commented 6 years ago

Closing this issue because we've now deployed it.