kubeflow / code-intelligence

ML-Powered Developer Tools, using Kubeflow
https://medium.com/kubeflow/reducing-maintainer-toil-on-kubeflow-with-github-actions-and-machine-learning-f8568374daa1?source=friends_link&sk=ac77444f00c230e7d787edbfb0081918
MIT License
55 stars 21 forks source link

[label bot] Embeddings Service should use GraphQL API to fetch issue data #126

Open jlewi opened 4 years ago

jlewi commented 4 years ago

Right now the embedding code is using BeautifulSoup to fetch and extract title and body from a GitHub issue. https://github.com/kubeflow/code-intelligence/blob/9bbdce34fc0d81bfb9a63493941763771d2a0746/py/code_intelligence/embeddings.py#L36

I'm noticing that these leads to slight discrepancies between how whitespace is encoded in the resulting body compared to the data we get via the GraphQL API and/or BigQuery.

As an example consider the issue: tps://github.com/kubeflow/katib/issues/1062

Here's the body returned using GraphQL

kind feature\r\n\r\nKatib should have functionality to save Suggestion state somewhere besides Suggestion pod. \r\nSome users would like to resume Experiments, but they don't want to have always running Suggestion deployment. For example we can use PV.\r\n\r\nWe can use `ResumeExperiment` flag from here: https://github.com/kubeflow/katib/issues/1061 to specify resuming experiment mechanism.\r\n\r\n/cc @johnugeorge @gaocegege @hougangliu @richardsliu \r\n

Here's the value returned by get_issue_text

"/kind feature\nKatib should have functionality to save Suggestion state somewhere besides Suggestion pod.\nSome users would like to resume Experiments, but they don't want to have always running Suggestion deployment. For example we can use PV.\nWe can use ResumeExperiment flag from here: #1061 to specify resuming experiment mechanism.\n/cc @johnugeorge @gaocegege @hougangliu @richardsliu

So the whitespace is encoded slightly differently.

Ideally this shouldn't matter because even if the embeddings are different because the whitespace is different arguably the network should still learn to be invariant to these types of perturbations.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.69

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.