Closed haktancetin closed 8 months ago
The Mistral-7B-v0.1 was tested on the existing TGI server architecture. This particular model was chosen due to its explicit ability to output data in JSON format, an ability that would be very useful for saving AI-generated recipes.
The testing is currently unsuccessful due to two factors. Those are as follows:
Solutions to these issues were investigated, and will be explained upon further testing.
Upon further research, the LM Studio application was discovered. LM Studio is an easy-to-use desktop application for experimenting with local and open-source LLMs. It also allows for the creation of a local inference server, which is of great interest to this project.
LM Studio was used to run Mistral-7B-instruct-v0.1 on a personal laptop with 32 GB RAM and a RTX 2070 Super GPU. This combination was thought to be near-impossible on the previous Colab-driven architecture, however LM Studio's optimization features made this task feasible.
The LLM was then given a complex system prompt detailing its functionality and output formats for both recipe generation and general question answering. Both tasks were tested, and a high-quality response was achieved in ~2 minutes, a stark contrast to the >20 minute wait times on the previous architecture.
In conclusion, it is strongly believed that creating an Ngrok tunnel on LM Studio's local inference server will result in a stable, fast and free LLM module that can be hosted locally on a personal computer.
The streamlit-app-haktan branch was updated by deleting the Colab module and connecting the Streamlit app to an LM Studio server. As expected, overall performance and coherency saw great improvement. However, accessing the chatbot through the Ngrok tunnel results in not being able to access the inference server. This problem will be discussed in further detail in its own issue.
While faster than running locally, the LLM modules still provide outputs at a slower rate than what might be acceptable. Further performance improvements need to be researched and implemented where possible.
Currently possible vectors of research are as follows:
This task will be regarded as completed when the above paths have been explored and documented.