konveyor / kai

Konveyor AI - static code analysis driven migration to new targets via Generative AI
Apache License 2.0
23 stars 29 forks source link

Coolstore evaluation #294

Closed pranavgaikwad closed 4 weeks ago

pranavgaikwad commented 1 month ago

This is a series of experiments we performed for Coolstore application evaluating the responses we get from Kai for various different migration issues under different conditions. We use an LLM to evaluate responses themselves.

Conclusion

Zero Shot prompts may work for modernizing small, somewhat easy and isolated examples of source code. The model needs to understand the target technology to produce useful results. However, when larger / complex codebases are involved, we observe that a smart model alone cannot help. We find that it's essential to pinpoint LLM to specific areas of codebase, and provide additional specific information about the issues to produce high quality responses.

Our results suggest that the approaches Kai is using to pinpoint issues using analysis information and solved examples improve the quality of responses. That being said, we also identify some problems with the current implementation of these approaches. We think that Kai must improve in following areas:

  1. Relevancy of solved examples: In the solved example diffs, do not include lines that may confuse the LLM.
  2. Fixes in the context of whole file: In the solved example summaries, try to provide summary by combining multiple incidents together in a file so it's shorter and takes into account effect of fixes affecting other issues in the file.

We believe that a mix of both diff and summary approaches might be better. More experimentation is needed to figure out how exactly that mix will look like.

Results

Here are some evaluation results we get for two different complexities of examples under three different approaches of forming prompts:

Easy Example

image

Hard Example

image

Learnings

About models

Zero Shot Prompts

Few Shot with Solved Example Diffs

Few Shot with Solved Example Summary

About evaluation

fabianvf commented 4 weeks ago

Would be awesome to add the writeup in the PR/at the end of the notebook as a doc as a followup, but doesn't need to happen now