neviaumi / made-in-uk

Application will list things that made in uk.
0 stars 0 forks source link

LLM use huggingface instead of gpt4all #461

Closed neviaumi closed 1 month ago

neviaumi commented 1 month ago

gpt4all wasn't popular package indeed.

neviaumi commented 1 month ago

https://huggingface.co/docs/transformers/en/model_doc/phimoe

Also phi3.5 was released, probably adopt that as well.

neviaumi commented 1 month ago

https://ollama.com/library/zephyr

While phi3.5 was trained with fast-attention, it would request CUDA to be configured.

As i don't have Nvidia , i even don't have Arm MacBook.

Probably escape from that and use something less restricted will be good idea.

neviaumi commented 1 month ago

After couple of research,

I should go to eather llama.cpp, ollama or vllm As it have promising performance ( gpt4all on top of llama.cpp actually)

ollama is preferred as it less configuration and ready to go.

On the other hand, llama.cpp i have to get the model gguf version from somewhere and config the server. The complexity of search or prepare modal to gguf is considerable

neviaumi commented 1 month ago

https://medium.com/@naman1011/ollama-vs-vllm-which-tool-handles-ai-models-better-a93345b911e6

While ollama not designed to handle concurrent request.

I should pickup something intended to exposed on production.

https://github.com/ggerganov/llama.cpp/discussions/6730

The thing i can pick may be pop-up here, llama.cpp, vLLM and TGI

Check out here for experimental on llama.cpp https://github.com/neviaumi/experimental-llm-agent

I can't get vLLM and TGI working on my computer.

ADR should be written for the decision making as it invoked a lot of research already.