sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.29k forks source link

(Backlog) Intent analysis #59272

Open cbart opened 9 months ago

cbart commented 9 months ago

The admin wants to understand what their devs are using Cody Chat for. The examples include explaining code, translating code, searching, non-code questions, etc.

Recommended priority is P2 as this request was dropped from PANW in favor of all other requests and we don't have a clear idea of how it can be used.

aramaraju commented 8 months ago

@arafatkatze Clarified this request for you :)

arafatkatze commented 7 months ago

NOTE: @chwarwick had a nice conversation with me about this issue and I had a few different ways to approach this

  1. Adding a small model on the end of extension to be able to do intent classification in the extension in a small model that runs inhouse. The bad thing about this approach is that this needs to be done such that it works for both the extensions like Vscode and Jetbrains. There are some cool Tf embeddings models that can help out with cosine similarity on this.
  2. A model added on the end of SG instance that runs things inhouse and doesn't send anything out of bounds would be pretty helpful and run for different extensions too. Right now we have many other important priorities for than this issue and it would also need greenlighting from AI team to get through so I am putting this on hold.

NOTE:

  1. I also tried fuzzy searching but its pretty bad and its ultimately just a glorified key matcher.
  2. Sending the whole thing to an LLM is pretty expensive and not useful
  3. Running small models for embeddings on the extension itself like universal sentence encoder is pretty helpful.
aramaraju commented 7 months ago

My suggestion is to let the AI team drive this effort as they are looking at building something in this realm for intent detection. cc: @chillatom and we can use this to handle the analytics side of the work. And this is a backlog item and not for May given this dependency and lack of urgency