opensouls / terminal-copilot

A smart terminal assistant that helps you find the right command.
Apache License 2.0
573 stars 43 forks source link

Move to gpt-3.5-turbo/gpt-4 #43

Closed 2mawi2 closed 1 year ago

2mawi2 commented 1 year ago

This experimental PR integrates GPT-3.5 Turbo to test out what is possible. I don't think we can merge it like this. A couple of things might have to be changed, specifically the input messages adapted to each platform we are using the tool.

I had to do significant changes to the prompt and the way we interact with the model. I did it similarly to the leaked Bing prompts, as it is a message-based model rather than a prompt-based model. This was necessary to guarantee the output format of the talkative model.

My initial impressions of the model quality are great. The quality of predicted commands is much better than every other tool I have tested. The user can interact with the model in multiple turns to refine the command. Furthermore, the model is capable of fixing its own mistakes from stderr of the user (and yes, this is sending sensitive data to OpenAI currently). The cherry on top is that it is about 10 times cheaper than the davinci model.

I'm open to discussions if something like this would make sense or not.

JoelKronander commented 1 year ago

Very cool! I really like the ability for it to fix errors from stderr!

I am not following why we should not merge this? Seems pretty great to me?

"A couple of things might have to be changed, specifically the input messages adapted to each platform we are using the tool." --> could you elaborate a bit more?

2mawi2 commented 1 year ago

There are 2 problems with the current changes. The examples are only provided for MacOS, I am unsure about the model quality outputs for e.g. Windows computers. In general I think we can improve with the examples: like proposed in the openAI cookbook:

messages = [ {"role": "system", "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English."}, {"role": "system", "name":"example_user", "content": "New synergies will help drive top-line growth."}, {"role": "system", "name": "example_assistant", "content": "Things working well together will increase revenue."}, {"role": "system", "name":"example_user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."}, {"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less busy about how to do better."}, {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."}, ]

The second problem: I noticed I broke the users chat history. Haven't checked if its because I am using result = subprocess.run(cmd,..)instead or if I just forgot to add history.save(cmd)

2mawi2 commented 1 year ago

@JoelKronander I believe this is ready to merge now. I've addressed the previously mentioned issues concerning the differences between Bourne, Fish, and CMD shells. Additionally, I've enabled users on the waitlist to utilize the new GPT-4 model. Initial tests have been impressive. GPT-4 demonstrates improved responsiveness to system prompts (e.g., refinement of commands) and significantly enhanced prompt accuracy.

2mawi2 commented 1 year ago

@JoelKronander @kafischer, just a side note here. I strongly recommend that we focus on enhancing our project's modularity at this stage. We should adopt the package by feature approach. Otherwise, maintaining the project will become increasingly challenging in the future.

2mawi2 commented 1 year ago

@kafischer @JoelKronander friendly reminder for the review

JoelKronander commented 1 year ago

@2mawi2 Sorry for late replay, LGTM!