ricklamers / gpt-code-ui

An open source implementation of OpenAI's ChatGPT Code interpreter
MIT License
3.56k stars 446 forks source link

Improving resilience to inaccurate code generation #42

Open alextrzyna opened 1 year ago

alextrzyna commented 1 year ago

โš ๏ธ Please check that this feature request hasn't been suggested before.

๐Ÿ”– Feature description

First of all, really cool project! I found gpt-code-ui when looking for an alternative to Code Interpreter/Notable that I could run locally.

I have noticed that gpt-code-ui is not quite as resilient to mistakes that it makes when generating code, specifically when compared to something like ChatGPT + Noteable plugin. For example, if gpt-code-ui makes a mistaken assumption about the name of a dataframe row in code that it generates, execution will fail and it will give up, whereas in the Noteable scenario ChatGPT is more likely to proactively inspect the results and attempt to fix it.

โœ”๏ธ Solution

Instead of just outputting the errors associated with a failed execution, proactively inspect the error and attempt a fix/re-run.

โ“ Alternatives

No response

๐Ÿ“ Additional Context

No response

Acknowledgements

ricklamers commented 1 year ago

Agree there is room for improvement in retrying/feeding error messages back into the model. Inviting the community to contribute PRs โ€“ itโ€™s out of scope for what I wanted to build personally.

Maybe GPT-5 is good enough as to not hallucinate variable names? ๐Ÿ˜„

CiaranYoung commented 1 year ago

Although openai has now made the code interpreter available to all plus users, the project is still very cool, I have a question is it as powerful as the official plugin

darkacorn commented 1 year ago

what kind of work would need to be done to run this on say a local llm with ooba ( has an openai compatible api)

dasmy commented 1 year ago

Working on this one. @ricklamers: I am preparing a pull request with a rough idea in https://github.com/dasmy/gpt-code-ui/tree/dev/conversation_history. Then we can discuss if and how my approach fits into the overall picture.

bitsnaps commented 1 year ago

@dasmy I have two ideas in mind:

  1. Detect if the code fails to execute statically (through the OS exit code, thrown exceptions, etc.).
  2. Auto-detect and fix the issue, similar to the official CodeInterpreter implementation. The concept involves piping the output to ChatGPT with a simple prompt to identify the problem and attempt to resolve it. This approach may yield better results but could require more resources. Ideally, we should ask the user to choose one of these options.