[Not ready yet]
To run the LLMs locally, split the app into micro-services:
Controller which runs the main logic and the index.
Embedder, runs a sentence-transformer model locally.
LLM, runs Llama cpp (Llama 2 is supported, a quantized version llama-2-7b.ggmlv3.q2_K.bin from HF is set by default, but needs to be downloaded prior to running the app.)
Still to fix:
Find the best solution for the controller to not call the other services before they are ready. downloading the llms takes a bit of time. For now, a simple sleep(10) before run() works.
Add readme file for this subproject.
Restructure the repo. I'll add a suggestion later.
[Not ready yet] To run the LLMs locally, split the app into micro-services:
Still to fix: