smol-ai / developer

the first library to let you embed a developer agent in your own app!
https://twitter.com/SmolModels
MIT License
11.83k stars 1.03k forks source link

Feature: `-- plan` flag #12

Open swyxio opened 1 year ago

swyxio commented 1 year ago

this is an easy one

swyxio commented 1 year ago

image image

swyxio commented 1 year ago

The discovery that only GPT-4 can self-improve, while weaker models cannot, is very intriguing, indicating a new type of emergent ability (i.e. to improve upon natural language feedback) may only exist when the model is "mature" (large and well-aligned) enough https://twitter.com/Francis_YAO_/status/1670618013089820674

Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair -- in which the model debugs and fixes mistakes in its own code -- has recently become a popular way to boost performance in these settings. However, only very limited studies on how and when self-repair works effectively exist in the literature, and one might wonder to what extent a model is really capable of providing accurate feedback on why the code is wrong when that code was generated by the same model. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges.