gpt-engineer-org / gpt-engineer

Platform to experiment with the AI Software Engineer. Terminal based. NOTE: Very different from https://gptengineer.app
MIT License
52.21k stars 6.8k forks source link

Add step to review all code, file by file, and rewrite any parts that miss necessary functionality #121

Closed AntonOsika closed 1 year ago

AntonOsika commented 1 year ago

This should resolve issues like #119

jebarpg commented 1 year ago

I have been looking into trying to incorporate Tree of thoughts into GPT-Engineer from this video's example. If anyone has any suggestions or ideas for getting this up and running I'm all ears... still trying to learn the code base a bit deeper to have a better understanding of where changes would be most effective. @AntonOsika

AntonOsika commented 1 year ago

Hey

Thanks for reaching out Jeb

I'm looking for someone to take lead on a project.

Specifically: How do we measure ourselves and know that we are making sure we do steady progress at all.

Would you be interested?

On Sat, Jun 17, 2023 at 11:49 PM Jeb @.***> wrote:

I have been looking into trying to incorporate Tree of thoughts into GPT-Engineer from this video's exampl https://www.youtube.com/watch?v=4hg-giBgzdg&ab_channel=TheAiSolopreneure. If anyone has any suggestions or ideas for getting this up and running I'm all ears... still trying to learn the code base a bit deeper to have a better understanding of where changes would be most effective.

— Reply to this email directly, view it on GitHub https://github.com/AntonOsika/gpt-engineer/issues/121#issuecomment-1595869496, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCCSUOLHQKGQY46DVJJB3LXLYQ6BANCNFSM6AAAAAAZKLQBA4 . You are receiving this because you authored the thread.Message ID: @.***>

-- Anton Osika

jebarpg commented 1 year ago

I would be. I'm not sure I understand you fully, but what I do understand from this statement: "How do we measure ourselves and know that we are making sure we do steady" Is how do we measure the progression of the repo as issues and PR requests are created and resolved. The second part of the sentence is a little ambiguous in how I read it, that is to say I'm not sure what it means to your to be "steady". I have some hunches that are sparked in my mind when you say 'steady' which are: Do we maintain a consistent schedule of release dates, and/or a 'quota' of issue/PR resolutions per given time period say every week or two weeks. Additionally do we have proper test coverage of the repo, Proper documentation of the code base, Proper templates for Issues and PR request, Do we have github badges in the README.md which reflect current versioning numbers and package dependencies and social media contacts, Do we have a code reviewing process setup for all weekly PRs or releases.

Are these some of the things you are thinking about keeping on top of and staying 'steady' with?

BTW I have more metrics to measure with if this message resonates with you then I'll add the additional lists of metrics.

jebarpg commented 1 year ago

I believe that we should implement a CLI for gpt-engineer. I am still researching and putting a plan together so this message is a bit premature, but I already know what I would like to extend in the code base to make it compatible with a CLI interface. I also would like to possibly make a gradios web interface or something which helps give the project a UI. I have started working on an extension for Visual Studio Code yesterday and got some working pieces put together to help deal with the initial API key settings and model selections right in Visual Studio Code.

Additionally like I said in another post I think it would be greatly beneficial to add in the tree of thought prompts into the philosophies... there has been shown to have 74% boost in accuracy and better responses generated. This in addition to the improvement of the chat response format and extra quality assurance STEPs I think will take this project to the next level.

Also I was looking through your list of examples and the array of comment groups of STEPS and I do believe there will be some issues with having these as they are (not that they are not well thought out) just that I think they will need to be a feature that is extendable and optional from the starting point. I don't currently have a good proposal to elaborate on right this moment but it is on my list of things to think over on my TODO.txt file on my desktop. I'm thinking about it in conjunction with the CLI so still working out the details on paper.

t0pd09590dp0t commented 1 year ago

One of the first things I set about doing was upgrading the preprompts. I got GPT-4 to use plugins to analyze several ToT research papers and repos. Then, it took what insights it gleaned and rewrote all the preprompts in a way that incorporates said insights.
Here is a link to my interaction with it: https://chat.openai.com/share/8342213e-6c3b-4f62-a9a6-bf8152342552

And here's a sample. It's a ToT version of fix_code:

{{{ You are a super smart developer. You have been tasked with fixing a program and making it work according to the best of your knowledge. There might be placeholders in the code you have to fill in.

  1. Identify the issues in the code.
  2. Propose at least three solutions for each issue.
  3. Evaluate the pros and cons of each solution.
  4. Choose the best solution for each issue and explain why you chose it.
  5. Implement the chosen solutions and provide the fixed code.

Remember, you are part of a team of experts. Each expert writes down one step of their thinking, shares it with the group, and then moves on to the next step. If any expert realizes they are wrong at any point, they leave. This approach is intended to emulate consensus and diversity of thought.

Please return the full new code in the same format.

Don't forget:

You are a super smart developer. You have been tasked with fixing a program and making it work according to the best of your knowledge. There might be placeholders in the code you have to fill in. Fill in all placeholders with actual functioning code. Don't leave anything out and do not use any type of shorthand or placeholder or anything like that. Fully implement all code after corrections. Leave nothing out. You provide fully functioning, well formatted code with few comments, that works and has no bugs. Please return the full new code in the same format. And thanks again! }}}

They say: “If it ain't broke, don't fix it,” so I just “augmented” the preprompts with PeTey G's ToT stuff and left the original instructions there, too. It may be said, therefore, that over here in Thought Tree City even my preprompts have preprompts. Heck, some of my prompts even have pre-preprompts. (omg, that was fun to type, lol) So yeah, I didn't make a pull request because I figured: "It's just prompts, so errbody prolly already did what I did and promptly pimped their prompts and preprompts and I'm the one late for prom.

brainspacer commented 1 year ago

One of the first things I set about doing was upgrading the preprompts. I got GPT-4 to use plugins to analyze several ToT research papers and repos. Then, it took what insights it gleaned and rewrote all the preprompts in a way that incorporates said insights. Here is a link to my interaction with it: https://chat.openai.com/share/8342213e-6c3b-4f62-a9a6-bf8152342552 ... ... ...

  1. Did you have much success with getting gpt-engineer to more fully produce the code using ToT? I tried a number of things without success, including upgrading the prompts using ToT and reducing the scope of my application specification in the hope gpt-engineer would at least fully code my subset specification. I'm using GPT4. Made no difference to what gpt-engineer produced. And gpt-engineer says 32k text length exceeded.

  2. As an aside, I learned a lot from your dialogue with the AI - in respect of how to use it in a way to really help research. Here's a cigar for you ====~ (I've actually passed it on to a few professor friends of mine who will raise this within their university in respect of teaching students how to use these tools to help them conduct research.)

t0pd09590dp0t commented 1 year ago

this is my copy of steps.py. I get full code because I added an option called full that utilizes the maximum number of steps. I did this by making a guide (included) of what steps do what a what steps must go before them to have the proper inputs available. Then, I added a step called double_fix_code that is able to fix_code it has already fixed or double-fixed. I have gpt-4 and have been getting full code (i think). ChatGPT says it's ready to run, but I've been working and haven't gotten around to running pytest.

import inspect import re import subprocess

from enum import Enum from typing import List, Union

from langchain.schema import AIMessage, HumanMessage, SystemMessage from termcolor import colored

from gpt_engineer.ai import AI from gpt_engineer.chat_to_files import to_files from gpt_engineer.db import DBs from gpt_engineer.learning import human_input

Message = Union[AIMessage, HumanMessage, SystemMessage]

def setup_sys_prompt(dbs: DBs) -> str: print("-" 50 + "setup_sys_prompt" + "-" 50) return ( dbs.preprompts["generate"] + "\nUseful to know:\n" + dbs.preprompts["philosophy"] )

def get_prompt(dbs: DBs) -> str: """Get prompt from input DBs."""

Check that prompt is provided

if "prompt" not in dbs.input and "main_prompt" not in dbs.input: raise ValueError("Please provide 'prompt' or 'main_prompt' in input DBs")

Prefer 'prompt' over 'main_prompt'

if "prompt" in dbs.input: return dbs.input["prompt"] else: return dbs.input["main_prompt"]

def curr_fn() -> str: """Get the name of the current function""" print("-" 50 + "curr_fn" + "-" 50) return inspect.stack()[1].function

All steps below have the signature Step

def simple_gen(ai: AI, dbs: DBs) -> List[Message]: """Run the AI on the main prompt and save the results""" messages = ai.start(setup_sys_prompt(dbs), get_prompt(dbs), step_name=curr_fn()) to_files(messages[-1].content.strip(), dbs.workspace) print("-" 50 + "simple_gen" + "-" 50) return messages

def clarify(ai: AI, dbs: DBs) -> List[Message]: """ Ask the user if they want to clarify anything and save the results to the workspace """ messages: List[Message] = [ai.fsystem(dbs.preprompts["clarify"])] user_input = get_prompt(dbs) while True: messages = ai.next(messages, user_input, step_name=curr_fn()) msg = messages[-1].content.strip()

    if msg == "Nothing more to clarify.":
        break

    if msg.lower().startswith("no"):
        print("Nothing more to clarify.")
        break

    print()
    user_input = input('(answer in text, or "n" to move on)\n')
    print()

    if not user_input or user_input == "n":
        print("(letting gpt-engineer make its own assumptions)")
        print()
        messages = ai.next(
            messages,
            "Make your own assumptions and state them explicitly before starting",
            step_name=curr_fn(),
        )
        print()
        return messages

    user_input += (
        "\n\n"
        "Is anything else unclear? If yes, only answer in the form:\n"
        "{remaining unclear areas} remaining questions.\n"
        "{Next question}\n"
        'If everything is sufficiently clear, only answer "Nothing more to clarify.".'
    )
print("-" * 50 + "clarify" + "-" * 50)
print()
return messages

def gen_spec(ai: AI, dbs: DBs) -> List[Message]: """ Generate a spec from the main prompt + clarifications and save the results to the workspace """ messages = [ ai.fsystem(setup_sys_prompt(dbs)), ai.fsystem(f"Instructions: {dbs.input['prompt']}"), ]

messages = ai.next(messages, dbs.preprompts["spec"], step_name=curr_fn())

dbs.memory["specification"] = messages[-1].content.strip()
print("-" * 50 + "gen_spec" + "-" * 50)
return messages

def respec(ai: AI, dbs: DBs) -> List[Message]: messages = AI.deserialize_messages(dbs.logs[gen_spec.name]) messages += [ai.fsystem(dbs.preprompts["respec"])]

messages = ai.next(messages, step_name=curr_fn())
messages = ai.next(
    messages,
    (
        "Based on the conversation so far, please reiterate the specification for "
        "the program. "
        "If there are things that can be improved, please incorporate the "
        "improvements. "
        "If you are satisfied with the specification, just write out the "
        "specification word by word again."
    ),
    step_name=curr_fn(),
)
print("-" * 50 + "respec" + "-" * 50)
dbs.memory["specification"] = messages[-1].content.strip()
return messages

def gen_unit_tests(ai: AI, dbs: DBs) -> List[dict]: """ Generate unit tests based on the specification, that should work. """ messages = [ ai.fsystem(setup_sys_prompt(dbs)), ai.fuser(f"Instructions: {dbs.input['prompt']}"), ai.fuser(f"Specification:\n\n{dbs.memory['specification']}"), ]

messages = ai.next(messages, dbs.preprompts["unit_tests"], step_name=curr_fn())

dbs.memory["unit_tests"] = messages[-1].content.strip()
to_files(dbs.memory["unit_tests"], dbs.workspace)
print("-" * 50 + "gen_unit_tests" + "-" * 50)
return messages

def gen_clarified_code(ai: AI, dbs: DBs) -> List[dict]: """Takes clarification and generates code""" messages = AI.deserialize_messages(dbs.logs[clarify.name])

messages = [
    ai.fsystem(setup_sys_prompt(dbs)),
] + messages[1:]
messages = ai.next(messages, dbs.preprompts["use_qa"], step_name=curr_fn())

to_files(messages[-1].content.strip(), dbs.workspace)
print("-" * 50 + "gen_clarified_code" + "-" * 50)
return messages

def gen_code(ai: AI, dbs: DBs) -> List[dict]:

get the messages from previous step

messages = [
    ai.fsystem(setup_sys_prompt(dbs)),
    ai.fuser(f"Instructions: {dbs.input['prompt']}"),
    ai.fuser(f"Specification:\n\n{dbs.memory['specification']}"),
    ai.fuser(f"Unit tests:\n\n{dbs.memory['unit_tests']}"),
]
messages = ai.next(messages, dbs.preprompts["use_qa"], step_name=curr_fn())
to_files(messages[-1].content.strip(), dbs.workspace)
print("-" * 50 + "gen_code" + "-" * 50)
return messages

def execute_entrypoint(ai: AI, dbs: DBs) -> List[dict]: command = dbs.workspace["run.sh"] print("-" 50 + "execute_entrypoint" + "-" 50) print("Do you want to execute this code?") print() print(command) print() print('If yes, press enter. Otherwise, type "no"') print() if input() not in ["", "y", "yes"]: print("Ok, not executing the code.") return [] print("Executing the code...") print() print( colored( "Note: If it does not work as expected, consider running the code"

def gen_entrypoint(ai: AI, dbs: DBs) -> List[dict]: messages = ai.start( system=( "You will get information about a codebase that is currently on disk in " "the current folder.\n" "From this you will answer with code blocks that includes all the necessary " "unix terminal commands to " "a) install dependencies " "b) run all necessary parts of the codebase (in parallel if necessary).\n" "Do not install globally. Do not use sudo.\n" "Do not explain the code, just give the commands.\n" "Do not use placeholders, use example values (like . for a folder argument) " "if necessary.\n" ), user="Information about the codebase:\n\n" + dbs.workspace["all_output.txt"], step_name=curr_fn(), ) print() print("-" 50 + "gen_entrypoint" + "-" 50) regex = r"\S*\n(.+?)" matches = re.finditer(regex, messages[-1].content.strip(), re.DOTALL) dbs.workspace["run.sh"] = "\n".join(match.group(1) for match in matches) return messages

def use_feedback(ai: AI, dbs: DBs): messages = [ ai.fsystem(setup_sys_prompt(dbs)), ai.fuser(f"Instructions: {dbs.input['prompt']}"), ai.fassistant(dbs.workspace["all_output.txt"]), ai.fsystem(dbs.preprompts["use_feedback"]), ] messages = ai.next(messages, dbs.input["feedback"], step_name=curr_fn()) to_files(messages[-1].content.strip(), dbs.workspace) print("-" 50 + "use_feedback" + "-" 50) return messages

def fix_code(ai: AI, dbs: DBs): messages = AI.deserialize_messages(dbs.logs[gen_code.name]) code_output = messages[-1].content.strip() messages = [ ai.fsystem(setup_sys_prompt(dbs)), ai.fuser(f"Instructions: {dbs.input['prompt']}"), ai.fuser(code_output), ai.fsystem(dbs.preprompts["fix_code"]), ] messages = ai.next( messages, "Please fix any errors in the code above.", step_name=curr_fn() ) to_files(messages[-1].content.strip(), dbs.workspace) print("-" 50 + "fix_code" + "-" 50) return messages

def double_fix_code(ai: AI, dbs: DBs):

get the messages from previous step

messages = AI.deserialize_messages(dbs.logs[gen_code.__name__])
code_output = messages[-1].content.strip()

# First fix
messages = [
    ai.fsystem(setup_sys_prompt(dbs)),
    ai.fuser(f"Instructions: {dbs.input['prompt']}"),
    ai.fuser(code_output),
    ai.fsystem(dbs.preprompts["fix_code"]),
]
messages = ai.next(
    messages, "Please fix any errors in the code above.  Add extensive error handling and debug logging if none is present.", step_name=curr_fn()
)
fixed_code = messages[-1].content.strip()
print("-" * 50 + "dbl_fix_code.part1" + "-" * 50)

# Second fix
messages.append(ai.fsystem("Coninue improving the code."))
messages = ai.next(
    messages, "Please allow yourself to look deeper into the code above with subtly and guile seek insight. Then, fill in ALL pseudo-code or placeholders of any kind with best guess full implementations.  If no improvements are possible, say so.", step_name=curr_fn()
)

to_files(messages[-1].content.strip(), dbs.workspace)
print("-" * 50 + "double_fix_code" + "-" * 50)
return messages

def human_review(ai: AI, dbs: DBs): review = human_input() dbs.memory["review"] = review.to_json() # type: ignore print("-" 50 + "human_review" + "-" 50) return []

The steps.py file in the gpt-engineer repository is responsible for defining the steps that the AI will take to generate code. Each step is a function that takes an AI instance and a DBs instance as arguments, and returns a list of messages. The messages are used to communicate with the AI and guide its actions.

#SIMPLE_GEN: This step runs the AI on the main prompt and saves the results. The AI is started with a system prompt and the main prompt, and the resulting messages are saved to the workspace.  It doesn't depend on any other steps.  simple_gen uses the preprompt setup_sys_prompt(dbs)

#CLARIFY: This step asks the user if they want to clarify anything and saves the results to the workspace. The AI is started with a system prompt and the user is asked if they want to clarify anything. The conversation continues until the user has nothing more to clarify.  It doesn't depend on any other steps.  clarify uses the preprompt dbs.preprompts["qa"].

#GEN_SPEC: This step generates a spec from the main prompt + clarifications and saves the results to the workspace. The AI is started with a system prompt, the main prompt, and the clarifications, and the resulting specification is saved to the workspace.  It depends on the clarify step as it uses the clarifications provided by the user.  gen_spec uses the preprompt setup_sys_prompt(dbs) and also accesses the main prompt using dbs.input['prompt']

RESPEC: This step asks the user if they want to reiterate the specification and saves the results to the workspace. The AI is started with a system prompt, the main prompt, and the clarifications, and the user is asked if they want to reiterate the specification. It depends on the gen_spec step as it uses the specification generated by the AI. respec uses the preprompt dbs.preprompts["respec"].

#GEN_UNIT_TESTS: This step generates unit tests based on the specification, that should work. The AI is started with a system prompt, the main prompt, the specification, and a prompt to generate unit tests. The resulting unit tests are saved to the workspace. It depends on the gen_spec or respec step as it uses the specification to generate the unit tests.  gen_unit_tests uses the preprompt setup_sys_prompt(dbs) and also accesses the main prompt and specification.

#GEN_CLARIFIED_CODE: This step takes clarification and generates code. The AI is started with a system prompt, the main prompt, and the clarifications, and the resulting code is saved to the workspace.  It depends on the clarify step as it uses the clarifications provided by the user.  gen_clarified_code uses the preprompt dbs.preprompts["use_qa"]

#GEN_CODE: This step generates code based on the specification and unit tests. The AI is started with a system prompt, the main prompt, the specification, and the unit tests, and the resulting code is saved to the workspace.  It depends on the gen_spec or respec step for the specification and the gen_unit_tests step for the unit tests.  gen_code use the preprompt dbs.preprompts["use_qa"]

#EXECUTE_ENTRYPOINT: This step executes the code. The AI is started with a system prompt, the main prompt, the specification, the unit tests, and the code, and the code is executed.  It depends on the gen_code or gen_clarified_code step for the code and the gen_entrypoint step for the entry point.

#GEN_ENTRYPOINT: This step generates an entry point for the code. The AI is started with a system prompt, the main prompt, the specification, the unit tests, and the code, and the resulting entry point is saved to the workspace.  It depends on the gen_code or gen_clarified_code step as it uses the generated code to create an entry point.  gen_entrypoint doesn't directly use a preprompt but uses the output from the workspace.

#USE_FEEDBACK: This step uses feedback to improve the code. The AI is started with a system prompt, the main prompt, the specification, the unit tests, the code, and the feedback, and the resulting improved code is saved to the workspace.  It depends on the gen_code or gen_clarified_code step for the initial code and requires feedback from the user or another source.  use_feedback uses the preprompt dbs.preprompts["use_feedback"]

#FIX_CODE: This step fixes any errors in the code. The AI is started with a system prompt, the main prompt, the specification, the unit tests, the code, and a prompt to fix any errors in the code. The resulting fixed code is saved to the workspace.  It depends on the gen_code or gen_clarified_code step for the initial code and potentially the execute_entrypoint step to identify any runtime errors.  fix_code uses the preprompt dbs.preprompts["fix_code"]

#DOUBLE_FIX_CODE:  iMPLEMENTED BY cLAUDEAI. sends fix code output back to db making possible duble fix code which does the same. really works

#HUMAN_REVIEW: This step allows a human to review the code. The AI is started with a system prompt, the main prompt, the specification, the unit tests, the code, and a prompt for a human to review the code. The resulting review is saved to the workspace.  It depends on the gen_code or gen_clarified_code step for the code to review.

#BASIC CODE GENERATION: simple_gen -> gen_entrypoint -> execute_entrypoint -> human_review. This sequence is for straightforward code generation without any clarifications or specifications.
#CLARIFICATION-BASED CODE GENERATION: clarify -> gen_clarified_code -> gen_entrypoint -> execute_entrypoint -> human_review This sequence is for generating code based on user clarifications.
#SPECIFICATION-BASED CODE GENERATION:  clarify -> gen_spec -> gen_unit_tests -> gen_code -> gen_entrypoint -> execute_entrypoint -> human_review  This sequence is for generating code based on a detailed specification and unit tests.
#FEEDBACK-BASED CODE IMPROVEMENT:  clarify -> gen_clarified_code -> use_feedback -> fix_code -> gen_entrypoint -> execute_entrypoint -> human_review  This sequence is for generating code based on clarifications and then improving it based on feedback.
#RE-SPECIFICATION SEQUENCE:  clarify -> gen_spec -> respec -> gen_unit_tests -> gen_code -> gen_entrypoint -> execute_entrypoint -> human_review  This sequence allows for re-specification after the initial specification is generated.
#UNIT TEST FIRST APPROACH:  gen_unit_tests -> clarify -> gen_spec -> gen_code -> gen_entrypoint -> execute_entrypoint -> human_review  This sequence starts with generating unit tests first, then proceeds with the usual steps.
#CLARIFICATION WITH FEEDBACK LOOP:  clarify -> gen_clarified_code -> use_feedback -> fix_code -> gen_entrypoint -> execute_entrypoint -> human_review  This sequence involves a feedback loop after generating code based on clarifications.
#FULL PROCESS:  clarify -> gen_spec -> respec -> gen_unit_tests -> gen_code -> use_feedback -> fix_code -> gen_entrypoint -> execute_entrypoint -> human_review This sequence involves almost all steps, providing a comprehensive approach to code generation and improvement.

class Config(str, Enum): DEFAULT = "default" BENCHMARK = "benchmark" SIMPLE = "simple" TDD = "tdd" TDD_PLUS = "tdd+" CLARIFY = "clarify" RESPEC = "respec" EXECUTE_ONLY = "execute_only" EVALUATE = "evaluate" USE_FEEDBACK = "use_feedback" BASIC_GEN = "basic_gen" CLARIFY_GEN = "clarify_gen" SPEC_GEN = "spec_gen" FEEDBACK_GEN = "fb_gen" RESPEC_GEN = "respec_gen" UNIT_TEST_FIRST = "unit_test_first" CLARIFY_FEEDBACK = "clarify_fb" FULL_PROCESS = "full"

Different configs of what steps to run

STEPS = { Config.BASIC_GEN: [ simple_gen, gen_entrypoint, execute_entrypoint, human_review, ], Config.CLARIFY_GEN: [ clarify, gen_clarified_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.SPEC_GEN: [ clarify, gen_spec, gen_unit_tests, gen_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.FEEDBACK_GEN: [ clarify, gen_clarified_code, use_feedback, fix_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.RESPEC_GEN: [ clarify, gen_spec, respec, gen_unit_tests, gen_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.UNIT_TEST_FIRST: [ gen_unit_tests, clarify, gen_spec, gen_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.CLARIFY_FEEDBACK: [ clarify, gen_clarified_code, use_feedback, fix_code, gen_entrypoint, execute_entrypoint, human_review, ], Config.FULL_PROCESS: [ clarify, gen_spec, respec, gen_unit_tests, gen_code, fix_code, use_feedback, double_fix_code,

double_fix_code,

    #double_fix_code,
    gen_entrypoint,
    execute_entrypoint,
    #human_review,
],Config.DEFAULT: [
    clarify,
    gen_clarified_code,
    gen_entrypoint,
    #execute_entrypoint,
    #human_review,
],
Config.BENCHMARK: [simple_gen, gen_entrypoint],
Config.SIMPLE: [simple_gen, gen_entrypoint, execute_entrypoint],
Config.TDD: [
    gen_spec,
    gen_unit_tests,
    gen_code,
    gen_entrypoint,
    execute_entrypoint,
    human_review,
],
Config.TDD_PLUS: [
    gen_spec,
    gen_unit_tests,
    gen_code,
    fix_code,
    gen_entrypoint,
    #execute_entrypoint,
    #human_review,
],
Config.CLARIFY: [
    clarify,
    gen_clarified_code,
    gen_entrypoint,
    execute_entrypoint,
    human_review,
],
Config.RESPEC: [
    gen_spec,
    respec,
    gen_unit_tests,
    gen_code,
    fix_code,
    gen_entrypoint,
    execute_entrypoint,
    human_review,
],
Config.USE_FEEDBACK: [use_feedback, gen_entrypoint, execute_entrypoint, human_review],
Config.EXECUTE_ONLY: [execute_entrypoint],
Config.EVALUATE: [execute_entrypoint, human_review],

}

Future steps that can be added:

run_tests_and_fix_files

execute_entrypoint_and_fix_files_if_it_results_in_error`

AntonOsika commented 1 year ago

Closing this in favor of "self healing code" via QA instead:

https://github.com/AntonOsika/gpt-engineer/blob/ae05736a762e180d533c76fe560cf549612595d4/ROADMAP.md