jacob-ai-bot / jacob

Just Another Coding Bot
https://jacb.ai
Apache License 2.0
96 stars 13 forks source link

Improve code discovery for existing files #52

Open kleneway opened 2 months ago

kleneway commented 2 months ago

Current JACoB uses a runtime-generated source map to identify the file names, types, and other important information for providing context of what files should be updated for a given issue. This actually works surprising well for many issues, however it falls short for more complex, multi-step bug fixes.

I previously implemented a very simple RAG approach last year which didn't work very well. One big challenge was that the semantic similarity of a GitHub issue and a code file are very different. Instead we need to convert the file into a format that more closely matches the type of information found in an issue to better identify the correct files to update.

This is a critical step in making JACoB work so we also want to use this approach as part of the user chat to identify the potential files to update and ask the user to confirm that this file/files are correct before starting work on the issue.

Here are some steps to implement the RAG system.