microsoft / vscode-copilot-release

Feedback on GitHub Copilot Chat UX in Visual Studio Code.
https://marketplace.visualstudio.com/items?itemName=GitHub.copilot-chat
Creative Commons Attribution 4.0 International
311 stars 28 forks source link

GPT-4 in Inline Chat #664

Open BeamoINT opened 9 months ago

BeamoINT commented 9 months ago

GitHub copilot inline chat currently uses GPT-3.5 while the sidebar chat uses GPT-4. I really like the inline chat feature since it is convenient and easy to work with, but since it uses GPT-3.5, I choose to use the sidebar chat since 3.5 is not good enough for my needs. I think it would be great if the inline chat feature would use GPT-4 so that I can have the same level of power while having greater convenience in my workflow. (GPT-4 Turbo is coming out of preview soon and I also left another issue saying to add that so by saying to change inline chat to GPT-4, I really mean GPT-4 Turbo. I also get that you guys have a multi-model approach so, I also mean that you should have GPT-4 Turbo as an option of models to switch between, with it being the primary model).

digitarald commented 9 months ago

So far GPT-3.5 is picked based our benchmarks, where 3.5 is on par for reliability with GPT-4 for most inline chat and slash commands test scenarios. We'll keep evaluating though as we get access to new models.

Meanwhile it might be worth adding a switch.

Do you have an example for which GPT-4 works more reliable than 3.5?

aiday-mar commented 9 months ago

Hi @BeamoINT as @digitarald mentioned, currently we are exploring using a higher version of GPT for slash commands that require more complex work. We are investigating using GPT-4 for example for the /fix command. If this rolls out, we could consider adding a setting to switch between GPT-3.5 and GPT-4 for /fix.

I am not sure if we will be using GPT-4 Turbo however.

Anselmoo commented 8 months ago

GitHub copilot inline chat currently uses GPT-3.5 while the sidebar chat uses GPT-4. I really like the inline chat feature since it is convenient and easy to work with, but since it uses GPT-3.5, I choose to use the sidebar chat since 3.5 is not good enough for my needs.

In addition to this, the results for inline and sidebar strongly differ in accuracy and complexity. The sad point is you might want to fix or improve code line-wise but end up marking or copying the complete text/code into the sidebar chat and copying it back.

kurutah commented 6 months ago

So far GPT-3.5 is picked based our benchmarks, where 3.5 is on par for reliability with GPT-4 for most inline chat and slash commands test scenarios. We'll keep evaluating though as we get access to new models.

Meanwhile it might be worth adding a switch.

Do you have an example for which GPT-4 works more reliable than 3.5?

Inline chat is almost useless for now, Idk what benchmarks it based on, I literally regret every time I try to use it

Eianex commented 5 months ago

I think it is time to switch to GPT-4 for inline code predictions (non turbo version), the inline-chat is just annoying and will become obsolete the moment the inline code predictions start using GPT-4's power. Also, give the Github Copilot the ability to directly edit file/s by itself when prompted by the user, it does not make any sense that it can read your entire workspace but can not edit it. It is nerve-racking. Thanks for the current work and I hope that the new Blackwell stuff from our taiwanese dude from Nvidia helps you guys implement GPT-4's full potential. Cheers.

nonoash commented 4 months ago

I was googling why my inline chat (not completions) was giving far worse results and saw some threads, it's a nice feature to save time vs chat agent but to be honest it doesn't even feel like it's a 3.5 vs 4 problem, the inline chat is like missing code context and reply to your prompt from scratch (at least that's my feeling)

As an example in a jupyter cell, i was asking how to add x and y labels to a seaborn heatmap. It started importing matplotlib etc... Whereas the chat agent understood better and simply said to add .set(xlabel="Predicted labels", ylabel="True labels") to sns.heatmap(...)

similar #66365

digitarald commented 4 months ago

@nonoash please file a new issue with a test case and we can make sure to cover in our automated benchmarks

kurutah commented 1 month ago

Any news?

tats-u commented 1 week ago

Why doesn't Inline Chat even use the GPT-4o mini today? We don't need stale models (GPT-3.x based) that don't know of the current Prime Minister of Japan. Training data of all GPT-4 based models including that used by the current Bing Chat is much newer than that of GPT-3.x. OpenAI says GPT-3.5 Turbo is deprecated and we should use GPT-4o mini instead.