irgolic / AutoPR

Run AI-powered workflows over your codebase
https://discord.gg/ykk7Znt3K6
MIT License
1.25k stars 83 forks source link

GPT-3.5-turbo compatibility #65

Closed irgolic closed 1 year ago

irgolic commented 1 year ago

I'm confident AutoPR can be optimized to work well with gpt-3.5-turbo.

Background

As it stands, the rail-v1 pull request agent works fine with gpt-3.5-turbo.

However, the codegen agent does not. rail-v1 codegen may work, but suffers from the same problems that it does when used by gpt-4 – it produces an unreliable diff with ambiguous errors.

My hope is to use auto-v1 codegen, which creates unambiguous changes.

Action selection

Click here to reveal the write-up on action selection. > ## How `auto-v1` codegen works > > Here is an example of a successful action selection call: > > >
> > Prompt > > > > > > ``` > > You are about to make a decision on what to do next. > > > > This is the issue that was opened: > > ```#20 Create a dice rolling bot > > > > irgolic: In the `dice_rolling_bot` folder, develop a Discord bot that allows users to roll virtual dice by typing a command (e.g., "!roll 1d6"). The bot should support various dice types (e.g., d4, d6, d8, d10, d12, d20) and multiple dice rolls at once. Include a README.md file with information on the bot's features, setup instructions, and usage guidelines. > > > > > > > > ``` > > > > This is the pull request we're creating: > > ```Title: Feature/dice-rolling-bot: Add Dice Rolling Bot Functionality > > > > This PR resolves issue #20, adding a dice rolling bot to the `dice_rolling_bot` folder. This Discord bot allows users to roll virtual dice by typing a command, such as "!roll 1d6". The bot supports various dice types (d4, d6, d8, d10, d12, d20) and multiple dice rolls at once. A README.md file has been included with information on the bot's features, setup instructions, and usage guidelines. > > > > Commits: > > 1. Create dice_rolling_bot.py > > 2. Add README.md for dice rolling bot > > > > Please review the changes and let me know if any modifications are required. > > > > 1. Commit: Create dice_rolling_bot.py > > Files: dice_rolling_bot/dice_rolling_bot.py:L1-L23 > > Changes: > > - Created a new file `dice_rolling_bot.py` > > - Imported required libraries (random and discord) > > - Implemented the main functionality for the bot to roll dice based on user input > > 2. Commit: Add README.md for dice rolling bot > > Files: dice_rolling_bot/README.md:L1-L32 > > Changes: > > - Created a new file `README.md` > > - Added setup instructions for the bot > > - Provided usage guidelines and examples for the dice rolling bot > > ``` > > > > This is the commit we're writing: > > ```Create dice_rolling_bot.py > > > > - Created a new file `dice_rolling_bot.py` > > - Imported required libraries (random and discord) > > - Implemented the main functionality for the bot to roll dice based on user input``` > > > > This is the codebase subset we decided to look at: > > `````` > > > > These are the actions we've taken so far: > > `````` > > > > In our pursuit of implementing this commit, please make a decision on what to do next. > > If a new file should be created, respond with `new_file` and the file path and description. > > If one of the files in the codebase subset should be edited, respond with `edit_file` and the file path, description, and line numbers of the hunk to edit. When picking the line numbers, keep the hunk as short as possible while fulfilling the description. If you don't know which line numbers to pick, leave them blank. > > If you're done, respond with `finished` and the commit message describing the past actions. > > ``` > > > > > > Given below is XML that describes the information to extract from this document and the tags to extract it into. > > > > ```xml > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ``` > > > > > > ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise. > > > > Here are examples of simple (XML, JSON) pairs that show the expected behavior: > > - `` => `{'foo': 'example one'}` > > - `` => `{"bar": ['STRING ONE', 'STRING TWO', etc.]}` > > - `` => `{'baz': {'foo': 'Some String', 'index': 1}}` > > > >
Raw Response { "action": "new_file", "new_file": { "filepath": "dice_rolling_bot/dice_rolling_bot.py", "description": "Created a new file `dice_rolling_bot.py` with required libraries (random and discord) and implemented the main functionality for the bot to roll dice based on user input" } }
Parsed Response { "action": "new_file", "new_file": { "filepath": "dice_rolling_bot/dice_rolling_bot.py", "description": "Created a new file `dice_rolling_bot.py` with required libraries (random and discord) and implemented the main functionality for the bot to roll dice based on user input" }, "edit_file": null, "commit_message": null }

What's preventing gpt-3.5-turbo from working properly?

In the action selection stage, gpt-3.5-turbo commonly returns the action selection JSON in an incorrect format. Sometimes it wraps it in triple-backtick groups (now handled by https://github.com/ShreyaR/guardrails/pull/118). Commonly it flattens the dict, returning something akin to this:

Raw Response { "action": "new_file", "filepath": "dice_rolling_bot/dice_rolling_bot.py", "description": "Created a new file `dice_rolling_bot.py` with required libraries (random and discord) and implemented the main functionality for the bot to roll dice based on user input" }

In theory, this is an unambiguous error, and can be corrected for. See some discussion on this topic in https://github.com/ShreyaR/guardrails/pull/123. Ideally this fix is submitted through guardrails, but if that turns out to be too complicated, it's fine if AutoPR gets around it with a custom guardrails validator or parsing the raw string as a JSON on failure.

Following a conversation in a Discord thread, perhaps a better solution is to abstract out an "action selection engine", and implement different implementations for different models:

  • JSON with guardrails (as auto-v1 does now)
  • ! discord command syntax (e.g., !newfile <filename> <description>)
  • pythonic syntax (e.g., def new_file(filename: str, description: str))

File creation and editing

How auto-v1 codegen works

Here's an example of a successful new file generation:

Prompt Human: Hey, we've got a new file to create. This is the issue that was opened: ``` #20 Create a dice rolling bot irgolic: In the `dice_rolling_bot` folder, develop a Discord bot that allows users to roll virtual dice by typing a command (e.g., "!roll 1d6"). The bot should support various dice types (e.g., d4, d6, d8, d10, d12, d20) and multiple dice rolls at once. Include a README.md file with information on the bot's features, setup instructions, and usage guidelines. ``` This is the pull request we're creating: ``` Title: Feature/dice-rolling-bot: Add Dice Rolling Bot Functionality This PR resolves issue #20, adding a dice rolling bot to the `dice_rolling_bot` folder. This Discord bot allows users to roll virtual dice by typing a command, such as "!roll 1d6". The bot supports various dice types (d4, d6, d8, d10, d12, d20) and multiple dice rolls at once. A README.md file has been included with information on the bot's features, setup instructions, and usage guidelines. Commits: 1. Create dice_rolling_bot.py 2. Add README.md for dice rolling bot Please review the changes and let me know if any modifications are required. 1. Commit: Create dice_rolling_bot.py Files: dice_rolling_bot/dice_rolling_bot.py:L1-L23 Changes: - Created a new file `dice_rolling_bot.py` - Imported required libraries (random and discord) - Implemented the main functionality for the bot to roll dice based on user input 2. Commit: Add README.md for dice rolling bot Files: dice_rolling_bot/README.md:L1-L32 Changes: - Created a new file `README.md` - Added setup instructions for the bot - Provided usage guidelines and examples for the dice rolling bot ``` This is the commit we're writing: ``` Create dice_rolling_bot.py - Created a new file `dice_rolling_bot.py` - Imported required libraries (random and discord) - Implemented the main functionality for the bot to roll dice based on user input ``` This is the codebase subset we decided to look at: ``` ``` This is the plan for the file we're creating: ``` Created a new file `dice_rolling_bot.py` with required libraries (random and discord) and implemented the main functionality for the bot to roll dice based on user input ``` Please send me the contents of the file. RESPONSE FORMAT INSTRUCTIONS ---------------------------- When responding to me, please use the following format. Make sure you return both the code enclosed in backticks and the JSON immediately after. ``` ``` { "outcome": string # A description of the outcome of the attempt to rewrite the file hunk according to the problem statement. }
Raw Response ```python import random import discord from discord.ext import commands intents = discord.Intents.default() intents.typing = False intents.presences = False bot = commands.Bot(command_prefix="!", intents=intents) @bot.event async def on_ready(): print(f"{bot.user} has connected to Discord!") @bot.command(name="roll", help="Rolls a die. Format: !roll d") async def roll_dice(ctx, dice: str): num_of_dice, sides_of_die = map(int, dice.split("d")) rolls = [random.randint(1, sides_of_die) for _ in range(num_of_dice)] await ctx.send(', '.join(map(str, rolls))) bot.run("your_bot_token_here") ``` { "outcome": "Created dice_rolling_bot.py with the necessary imports and implemented the main functionality for the bot to roll dice based on user input." }
Result { "contents": "import random\nimport discord\nfrom discord.ext import commands\n\nintents = discord.Intents.default()\nintents.typing = False\nintents.presences = False\n\nbot = commands.Bot(command_prefix=\"!\", intents=intents)\n\n@bot.event\nasync def on_ready():\n print(f\"{bot.user} has connected to Discord!\")\n\n@bot.command(name=\"roll\", help=\"Rolls a die. Format: !roll d\")\nasync def roll_dice(ctx, dice: str):\n num_of_dice, sides_of_die = map(int, dice.split(\"d\"))\n rolls = [random.randint(1, sides_of_die) for _ in range(num_of_dice)]\n await ctx.send(', '.join(map(str, rolls)))\n\nbot.run(\"your_bot_token_here\")", "outcome": "Created dice_rolling_bot.py with the necessary imports and implemented the main functionality for the bot to roll dice based on user input." }

Notice that the LLM reflects on its change in the "outcome" JSON key. This is fed back into subsequent actions to inform what's been done so far.

Here's an example of a successful file edit:

Prompt Human: Hey, we've got a new code hunk to diff. This is the issue that was opened: ``` #29 Create a movie recommendation app irgolic: In the `movie_recommendation` folder, develop a web app that provides personalized movie recommendations based on user preferences, such as genre, actors, and directors. The app should display movie posters, summaries, and trailers. Include a README.md file with an explanation of the app's features and installation instructions. ``` This is the pull request we're creating: ``` Title: #42 Movie Recommendation Web App Closes #29 This pull request implements a movie recommendation web app that provides personalized movie recommendations based on user preferences, such as genre, actors, and directors. The app displays movie posters, summaries, and trailers. A README.md file with the app's features and installation instructions is also included. Please review the changes and let me know if any modifications are needed. 1. Commit: Added initial app files - movie_recommendation/app.py, movie_recommendation/templates/base.html Files: movie_recommendation/app.py, movie_recommendation/templates/base.html Changes: * movie_recommendation/app.py - Add the initial flask app file and import required libraries. * movie_recommendation/templates/base.html - Create the base html file for the web app. 2. Commit: Implemented movie recommendation logic - movie_recommendation/movie_recommender.py Files: movie_recommendation/movie_recommender.py Changes: * movie_recommendation/movie_recommender.py - Add the logic for generating personalized movie recommendations based on user preferences. 3. Commit: Created movie recommendation form - movie_recommendation/templates/recommendation_form.html Files: movie_recommendation/templates/recommendation_form.html Changes: * movie_recommendation/templates/recommendation_form.html - Add a form to collect user preferences for movie recommendations. 4. Commit: Display recommended movies - movie_recommendation/templates/recommended_movies.html Files: movie_recommendation/templates/recommended_movies.html Changes: * movie_recommendation/templates/recommended_movies.html - Add a template to display the recommended movies with posters, summaries, and trailers. 5. Commit: Add routes for recommendation form and results - movie_recommendation/app.py Files: movie_recommendation/app.py Changes: * movie_recommendation/app.py - Add routes for the recommendation form and displaying the results. 6. Commit: Include README.md with app features and installation instructions - README.md Files: README.md Changes: * README.md - Add a detailed explanation of the app's features and installation instructions. 7. Commit: Refactor movie recommendation logic - movie_recommendation/movie_recommender.py Files: movie_recommendation/movie_recommender.py Changes: * movie_recommendation/movie_recommender.py - Refactor the movie recommendation logic to improve performance and readability. ``` This is the commit we're writing: ``` Add routes for recommendation form and results - movie_recommendation/app.py * movie_recommendation/app.py - Add routes for the recommendation form and displaying the results. ``` This is the codebase subset we decided to look at: ``` >>> File: movie_recommendation/app.py 1 | from flask import Flask, render_template, request, redirect, url_for 2 | import movie_recommendation.movie_recommender as mr 3 | 4 | app = Flask(__name__) 5 | 6 | @app.route('/') 7 | def home(): 8 | return render_template('base.html') 9 | 10 | if __name__ == "__main__": 11 | app.run(debug=True) ``` This is the hunk we're rewriting: ``` 3 | 4 | app = Flask(__name__) 5 | 6 * @app.route('/') 7 * def home(): 8 * return render_template('base.html') 9 | 10 | if __name__ == "__main__": 11 | app.run(debug=True) ``` This is the plan for how we want to rewrite the hunk: ``` Add routes for the recommendation form and displaying the results. ``` Please rewrite the hunk to match the plan, but do not include any lines prefixed with | in the result. RULES: - ONLY rewrite the lines prefixed with *, - submit only the lines without the * prefix, - do not preserve the relative leading indentation of the lines (start the hunk's indentation at 0). RESPONSE FORMAT INSTRUCTIONS ---------------------------- When responding to me, please use the following format. Make sure you return both the code enclosed in backticks and the JSON immediately after. ``` ``` { "outcome": string # A description of the outcome of the attempt to rewrite the file hunk according to the problem statement. }
Raw Response ``` @app.route('/') def home(): return render_template('recommendation_form.html') @app.route('/recommend', methods=['POST']) def recommend(): user_preferences = request.form recommended_movies = mr.get_recommendations(user_preferences) return render_template('recommended_movies.html', movies=recommended_movies) ``` { "outcome": "Hunk rewritten with the addition of routes for the recommendation form and displaying the results." }
Result { "contents": "@app.route('/')\ndef home():\n return render_template('recommendation_form.html')\n\n@app.route('/recommend', methods=['POST'])\ndef recommend():\n user_preferences = request.form\n recommended_movies = mr.get_recommendations(user_preferences)\n return render_template('recommended_movies.html', movies=recommended_movies)", "outcome": "Hunk rewritten with the addition of routes for the recommendation form and displaying the results." }

Notice how the hunk to be rewritten is prefixed with * with some surrounding lines of context.

What's preventing gpt-3.5-turbo from working properly?

It's really bad at generating something in the format of:

```
<code>
```
{
  <metadata>
}

It commonly drops the closing backtick of the code, leaves the <code> empty, or repeats the description of the content instead of writing actual code. Omitting the trailing metadata will probably improve performance by a lot, as well as adding some relevant few-shot examples.

Also, it's really bad at following the instruction of rewriting only the code prefixed by *. It will just give you the full file/include the context lines.

An example exhibiting both these qualities:

Prompt Human: Hey, we've got a new code hunk to diff. This is the issue that was opened: ``` #31 Design a virtual art gallery irgolic: In the `virtual_art_gallery` folder, create a web-based virtual art gallery that showcases a variety of artwork, such as paintings, sculptures, and photography. Allow users to browse through the gallery and view detailed information about each piece. Add a README.md file explaining the gallery's purpose, features, and how to set it up. [irgolic/AutoPR#65](https://github.com/irgolic/AutoPR/issues/65) ``` This is the pull request we're creating: ``` Title: Design a virtual art gallery - Issue #31 - Added a new virtual art gallery that showcases various artwork - Created an index.html file in the `virtual_art_gallery` folder to display the gallery and implemented CSS styles - Added `artwork.json` file in the `virtual_art_gallery` folder to store artwork data - Implemented artwork data parsing and rendering with `app.js` in the `virtual_art_gallery` folder - Added a README.md file to explain the gallery's purpose, features, and how to set it up 1. Commit: Create virtual_art_gallery folder and index.html file Files: virtual_art_gallery/index.html Changes: - Added a new virtual art gallery that showcases various artwork 2. Commit: Add CSS styles to index.html Files: virtual_art_gallery/index.html, virtual_art_gallery/style.css Changes: - Created an index.html file in the `virtual_art_gallery` folder to display the gallery and implemented CSS styles 3. Commit: Create artwork data file Files: virtual_art_gallery/artwork.json Changes: - Added `artwork.json` file in the `virtual_art_gallery` folder to store artwork data 4. Commit: Implement artwork data parsing and rendering Files: virtual_art_gallery/index.html, virtual_art_gallery/app.js Changes: - Implemented artwork data parsing and rendering with `app.js` in the `virtual_art_gallery` folder 5. Commit: Add README.md Files: README.md Changes: - Added a README.md file to explain the gallery's purpose, features, and how to set it up ``` This is the commit we're writing: ``` Create virtual_art_gallery folder and index.html file - Added a new virtual art gallery that showcases various artwork ``` This is the codebase subset we decided to look at: ``` >>> File: virtual_art_gallery/index.html 1 | ``` This is the hunk we're rewriting: ``` 1 | ``` This is the plan for how we want to rewrite the hunk: ``` Add new artwork to the virtual gallery ``` Please rewrite the hunk to match the plan, but do not include any lines prefixed with | in the result. RULES: - ONLY rewrite the lines prefixed with *, - submit only the lines without the * prefix, - do not preserve the relative leading indentation of the lines (start the hunk's indentation at 0). RESPONSE FORMAT INSTRUCTIONS ---------------------------- When responding to me, please use the following format. Make sure you return both the code enclosed in backticks and the JSON immediately after. ``` ``` { "outcome": string # A description of the outcome of the attempt to rewrite the file hunk according to the problem statement. }
Raw Response ``` { "outcome": "Success" }

What can we do about it?

Here are some ideas:

cballou commented 1 year ago

Just wanted to add on my fail state as I don't yet have access to 4.0. I switched to gpt-3.5-turbo and received an error about my monorepo:

Screenshot 2023-05-04 at 8 25 42 AM

I think in my case, given the complexity of the repository I'm trying to analyze, I'll just need to wait for 4.0... I'm asking it to specifically drill down into a sub-directory of the repo, target a specific version of a framework, and try to uncover missing/incomplete code.

My guess is the overarching request was too verbose (targeting an entire project and not, say, one specific controller or action) which in turn correlates to too many tokens.