Konveyor AI (kai) is Konveyor's approach to easing modernization of application source code to a new target by leveraging LLMs with guidance from static code analysis augmented with data in Konveyor that helps to learn how an Organization solved a similar problem in the past.
Pronunciation of 'kai': https://www.howtopronounce.com/kai
Our approach is to use static code analysis to find the areas in source code that need to be transformed. 'kai' will iterate through analysis information and work with LLMs to generate code changes to resolve incidents identified from analysis.
This approach does not require fine-tuning of LLMs, we augment a LLMs knowledge via the prompt, similar to approaches with RAG by leveraging external data from inside of Konveyor and from Analysis Rules to aid the LLM in constructing better results.
For example, analyzer-lsp Rules such as these (Java EE to Quarkus rulesets) are leveraged to aid guiding a LLM to update a legacy Java EE application to Quarkus
Note: For purposes of this initial prototype we are using an example of Java EE to Quarkus. That is an arbitrary choice to show viability of this approach. The code and the approach will work on other targets that Konveyor has rules for.
Konveyor contains information related to an Organization's Application Portfolio, a view into all of the applications an Organization is managing. This view includes a history of analysis information over time, access to each applications source repositories, and metadata that tracks work in-progress/completed in regard to each application being migrated to a given technology.
When 'Konveyor AI' wants to fix a specific issue in a given application, it will mine data in Konveyor to extract 2 sources of information to inject into a given LLM prompt.
Static Code Analysis
We include in the prompt Analysis metadata information to give the LLM more context such as
remote-ejb-to-quarkus-00000: description: Remote EJBs are not supported in Quarkus incidents:
uri: file:///tmp/source-code/src/main/java/com/redhat/coolstore/service/ShippingService.java
message: "Remote EJBs are not supported in Quarkus, and therefore its use must be removed and replaced with REST functionality. In order to do this:\n 1. Replace the @Remote
annotation on the class with a @jakarta.ws.rs.Path(\"<endpoint>\")
annotation. An endpoint must be added to the annotation in place of <endpoint>
to specify the actual path to the REST service.\n 2. Remove @Stateless
annotations if present. Given that REST services are stateless by nature, it makes it unnecessary.\n 3. For every public method on the EJB being converted, do the following:\n - Annotate the method with @jakarta.ws.rs.GET
\n - Annotate the method with @jakarta.ws.rs.Path(\"<endpoint>\")
and give it a proper endpoint path. As a rule of thumb...
lineNumber: 12 variables: file: file:///tmp/source-code/src/main/java/com/redhat/coolstore/service/ShippingService.java kind: Class name: Stateless package: com.redhat.coolstore.service
url: https://jakarta.ee/specifications/restful-ws/ title: Jakarta RESTful Web Services
Solved Examples - these are source code diffs that show a LLM how a similar problem was seen in another application the Organization has and how that Organization decided to fix it.
DEMO_MODE=true make run-server
OPENAI_API_KEY=my-secret-api-key-value
GENAI_KEY=my-secret-api-key-value
The development team has been using the IBM BAM service to aid development and testing:
IBM Big AI Model (BAM) laboratory is where IBM Research designs, builds, and iterates on what’s next in foundation models. Our goal is to help accelerate the transition from research to product. Come experiment with us.
Login: https://bam.res.ibm.com/
In order to use this service an individual needs to obtain a w3id from IBM. The kai development team is unable to help obtaining this access.
Related client tooling:
LangChain integration: https://ibm.github.io/ibm-generative-ai/v2.2.0/rst_source/examples.extensions.langchain.html#examples-extensions-langchain
Obtain your API key from IBM BAM:
To access via an API you can look at ‘Documentation’ after logging into https://bam.res.ibm.com/
Ensure you have GENAI_KEY=my-secret-api-key-value
defined in your shell
OPENAI_API_KEY=my-secret-api-key-value
defined in your shellWe offer configuration choices of several models via config.toml which line up to choices we know about from kai/model_provider.py.
To change which llm you are targeting, open config.toml
and change the [models]
section to one of the following:
IBM served granite
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "ibm/granite-13b-chat-v2"
IBM served mistral
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "mistralai/mixtral-8x7b-instruct-v01"
IBM served codellama
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "meta-llama/llama-2-13b-chat"
IBM served llama3
# Note: llama3 complains if we use more than 2048 tokens
# See: https://github.com/konveyor-ecosystem/kai/issues/172
[models]
provider = "ChatIBMGenAI"
[models.args]
model_id = "meta-llama/llama-3-70b-instruct"
parameters.max_new_tokens = 2048
Ollama
[models]
provider = "ChatOllama"
[models.args]
model = "mistral"
OpenAI GPT 4
[models]
provider = "ChatOpenAI"
[models.args]
model = "gpt-4"
OpenAI GPT 3.5
[models]
provider = "ChatOpenAI"
[models.args]
model = "gpt-3.5-turbo"
Kai will also work with OpenAI API Compatible alternatives.
Running Kai's backend involves running 2 processes:
git clone https://github.com/konveyor-ecosystem/kai.git
cd kai
python3 -m venv env
source env/bin/activate
pip install -r ./requirements.txt
pip install -e .
source env/bin/activate
make run-postgres
source env/bin/activate
make run-server
DEMO_MODE=true
DEMO_MODE=true make run-server
DEMO_MODE
option will cache responses and play them back on subsequent runs.LOG_LEVEL=debug
LOG_LEVEL=debug make run-server
source env/bin/activate
pushd samples; ./fetch_apps.py; popd
make load-data
main
branch which has the Java EE version.main
branch which need to be addressed before we move to Quarkus:
DEMO_MODE
and cached responsesThe kai server will always cache responses in the kai/data/vcr/<application_name>/<model>
directory. In non-demo mode, these responses will be overwritten whenever a new request is made.
When the server is run with DEMO_MODE=true
, these responses will be played back. The request will be matched on everything except for authorization headers, cookies, content-length and request body.
DEMO_MODE
Cached ResponsesDEMO_MODE
Updating Cached ResponsesThere are two ways to record new responses:
DEMO_MODE
kai/data/vcr/<application_name>/<model>/<source-file-path-with-slashes-replaced-with-dashes.java.yaml>
), then rerun. When a cached response does
not exist, a new one will be recorded and played back on subsequent runs.pip freeze &> ./requirements.txt
, we have a few directives that address differences in 'darwin' systems that need to be preserved. These need to be added manually after a 'freeze' as the freeze command is not aware of what exists in requirements.txt. Please consult the diff of changes you are making now from prior version and note the extra directions for python_version
and or sys_platform
cd ./samples
./fetch_apps.py
./samples/sample_repos
Note: We have checked in analysis runs for all sample applications so you do NOT need to run analysis yourself. The instructions below are ONLY if you want to recreate, this is NOT required
cd samples
./fetch_apps.py
# this will git clone example source code appscd macos
./restart_podman_machine.sh
# setups the podman VM on MacOS so it will mount the host filesystem into the VM./get_latest_kantra_cli.sh
# fetches 'kantra' our analyzer tool and stores it in ../bincd ..
./analyze_apps.py
# Analyzes all sample apps we know about, in both the 'initial' and 'solved' states, expect this to run for ~2-3 hours.Analysis data will be stored in: samples/analysis_reports/{APP_NAME}/<initial|solved>/output.yaml
trunk check
trunk fmt
This repository represents a prototype implementation as the team explores the solution space. The intent is for this work to remain in the konveyor-ecosystem as the team builds knowledge in the domain and experiments with solutions. As the approach matures we will integrate this properly into Konveyor and seek to promote to github.com/konveyor organization.
Refer to Konveyor's Code of Conduct here.