LLM Module Further Performance Improvements

haktancetin commented 9 months ago

While faster than running locally, the LLM modules still provide outputs at a slower rate than what might be acceptable. Further performance improvements need to be researched and implemented where possible.

Currently possible vectors of research are as follows:

Investigating smaller LLMs
Researching alternate modules to Text Generation Inference (e.g vLLM, cTranslate2)

This task will be regarded as completed when the above paths have been explored and documented.

haktancetin commented 9 months ago

The Mistral-7B-v0.1 was tested on the existing TGI server architecture. This particular model was chosen due to its explicit ability to output data in JSON format, an ability that would be very useful for saving AI-generated recipes.

The testing is currently unsuccessful due to two factors. Those are as follows:

Text Generation Inference does not support the Mistral architecture.
Colab Free instances lack the necessary RAM to download the model's requirements.

Solutions to these issues were investigated, and will be explained upon further testing.

haktancetin commented 9 months ago

Upon further research, the LM Studio application was discovered. LM Studio is an easy-to-use desktop application for experimenting with local and open-source LLMs. It also allows for the creation of a local inference server, which is of great interest to this project.

LM Studio was used to run Mistral-7B-instruct-v0.1 on a personal laptop with 32 GB RAM and a RTX 2070 Super GPU. This combination was thought to be near-impossible on the previous Colab-driven architecture, however LM Studio's optimization features made this task feasible.

The LLM was then given a complex system prompt detailing its functionality and output formats for both recipe generation and general question answering. Both tasks were tested, and a high-quality response was achieved in ~2 minutes, a stark contrast to the >20 minute wait times on the previous architecture.

In conclusion, it is strongly believed that creating an Ngrok tunnel on LM Studio's local inference server will result in a stable, fast and free LLM module that can be hosted locally on a personal computer.

haktancetin commented 8 months ago

The streamlit-app-haktan branch was updated by deleting the Colab module and connecting the Streamlit app to an LM Studio server. As expected, overall performance and coherency saw great improvement. However, accessing the chatbot through the Ngrok tunnel results in not being able to access the inference server. This problem will be discussed in further detail in its own issue.

haktancetin / 496_CookBuddyProject

LLM Module Further Performance Improvements #14