konveyor / enhancements

Enhancements tracking repository for Konveyor
Apache License 2.0
3 stars 35 forks source link

[LFX Fall '24 Mentorship]: Enhancing Kai with Data Querying for Fine-Tuning and Potential InstructLab Integration #191

Open JonahSussman opened 1 month ago

JonahSussman commented 1 month ago

This is an LFX mentorship project intended to run in the Fall of 2024.

This is related to https://github.com/cncf/mentoring/pull/1287

Description

Kai is a tool designed to leverage AI for application modernization by analyzing code, identifying issues, and suggesting fixes. We aim to enhance Kai by developing a robust data querying mechanism to facilitate fine-tuning processes. This enhancement will lay the groundwork for potential future integration with InstructLab, an open-source AI project enabling community contributions to Large Language Models (LLMs) by adding new skills or knowledge. The primary focus will be on creating mechanisms to query and utilize data effectively, with a stretch goal of integrating static analysis tools and implementing an agent-based workflow.

This project will significantly enhance Kai’s backend, making it more scalable and capable of providing deeper code insights while also contributing to the enrichment of LLMs through InstructLab. It will offer a rich learning experience for participating students, covering backend development, workflow management, and contributing to open-source AI projects.

In this project, you will

Current Situation

Kai currently processes code analyses through a centralized workflow. While effective, there is a need for enhanced data querying capabilities to support fine-tuning and optimization. Additionally, while the potential for integration with InstructLab is promising, the primary focus is on developing a robust data querying mechanism first. There is also an opportunity to explore agent-based workflows and static analysis tools as stretch goals.

Expected Outcome:

Recommended Skills:

Mentor(s):

Links:

konveyor-ci-bot[bot] commented 1 month ago

This issue is currently awaiting triage. If contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance. The triage/accepted label can be added by org members.

angad-singhh commented 1 month ago

Hey @JonahSussman , this project looks very interesting to me, also the project explanation is very clear. Just can you give a brief what is the application process for this ( do we only have to apply on LFX portal, or are there any pretest with this project , anything specific I can work on to improve my application )

Thanks

Ytemiloluwa commented 1 month ago

Hello @JonahSussman and @fabianvf my name is Temi, I am a software developer with experience in open source and enterprise environment. I am a 2 time summer of bitcoin intern with recent experience on working on pyasic backend methods ( A bitcoin mining library) . I have extensive knowledge of data queries and databases together with AI & ML models. my tech stack includes SQL, Python, Swift, Rust, bash scripting, Git, Docker and AWS.

I will like to ask if there is a requirement to submit a pull request or proposal on how I plan to enhance this project?

Looking forward to your feedbacks and I hope to work with you both this fall.

Thanks!

sarthakg004 commented 1 month ago

Hello @JonahSussman and @fabianvf, I am Saarthak, an aspiring data scientist with a keen interest in generative AI. I am interested in this project and will be able to make meaningful contributions to this projects. I have previously worked on a few projects around LLM, one of them being Building a chatbot for SQL querying. I have good knowledge of VectorDatabases, RAG framework, SQL, Python, Git, quantization and fine-tuning LLMs.

I would like to ask how can I submit my proposal for this project.

Looking forward to working with you.

Clemo97 commented 1 month ago

Hello @JonahSussman and @fabianvf. My name is Clement Lumumba, I have submitted my application and would like to start contributing. I am proficient in Python, SQL and have experience in data engineering.

Looking forward to working with you.

omkar-334 commented 1 month ago

Hello @JonahSussman and @fabianvf. I am Omkar, a CS undergraduate from India.
I am interested in this project and I have experience with python, sql, llms, backend development, retrieval augmented generation and function calling.
I have applied along with my cover letter and resume on the LFX website.
Are there any pretasks or prerequisites to be done?

Looking forward to contribute to the kai project.
Thank you.

debrupf2946 commented 3 weeks ago

@fabianvf @JonahSussman Hello,I am Debrup Paul I like the project and was diving through Kai, Kantra, InstructLab.The project is Setup locally and I have tried ran analysis on sample app(macos) its running fine. I was woking on project in GSoC where I was building Knowledge graph from code-repo and docs and running retriever for better and accurate response and text generation. I find kai and kantra will help a lot,I want to spend time and contribute in this project,I feel the Idea is very interesting!

We can also implement GraphRag in the step to extract contextual information from Solved Examples for code suggestion it can readily avoid hallucinations and we can llm which are small which can save cost!

I have one doubt what is exactly meant be data querying capabilities (pandas or sql based)?

Thanks Linkedin: https://www.linkedin.com/in/debrup-paul-599158227/