Closed rmusser01 closed 1 month ago
llama.cpp + recommendations for llama3-8B & MS Phi-3 128k context as offline models most people can run. Otherwise, Mixtral 7x22B & Llama3 70B.
Using the chat available at https://gpt.h2o.ai/ , one can compare the summarizations next to each other.
A user should be able to run the application and be able to have the LLM endpoint be started and queried by the script itself, without interaction from the user. This would help support batch usage.
Ideally this would be achieved through using Ollama and Llama.cpp, with the option available as part of the CLI.