Llama3 model takes super long time to load.

kolbytn / mindcraft

MIT License

666 stars 110 forks source link

Llama3 model takes super long time to load. #161

Open Ligerbot opened 2 weeks ago

Ligerbot commented 2 weeks ago

Whenever I give the llama3 model a prompt it takes about 5 minutes to actually respond and sometimes is just times out. When I use a smaller model like Tinyllama it goes faster but tinyllama has no idea how to use the mindcraft tools. My computer is powerful enough to run the model pretty fast and llama3 can answer a prompt in a couple seconds on my computer, but when I use the mindcraft thing it is super laggy. My laptop specs are

Processor : AMD Ryzen™ 9 6900HS with Radeon™ Graphics × 16
Memory : 16.0 GiB
Graphics : AMD Radeon™ Graphics / AMD Radeon™ RX 6700S

And I am running Debian 12.

saladnoob commented 2 weeks ago

it runs slow because llama3 is selfhosted ai, you need better hardware to run atleast 32gb of ram for better performance

Ligerbot commented 2 weeks ago

But llama3 runs fast when I am not using mindcraft. It usually just responds in a few seconds.

MCrashCraft commented 2 weeks ago

That ollama config that works the best at lease for me is.

"model": "dolphin-llama3", "embedding": "all-minilm",

This ran better than anything I could run on ollama since its self hosted. Here is my pc specs, CPU: AMD Ryzen 5 3600, Memory: 48gb, GPU: Gigabyte Eagle 8GB Nvidia GeForce RTX 3060 OS: Win 11

UniqueName12345 commented 2 weeks ago

That ollama config that works the best at lease for me is.

"model": "dolphin-llama3", "embedding": "all-minilm",

This ran better than anything I could run on ollama since its self hosted. Here is my pc specs, CPU: AMD Ryzen 5 3600, Memory: 48gb, GPU: Gigabyte Eagle 8GB Nvidia GeForce RTX 3060 OS: Win 11

I get error "Unknown embedding: all-minilm . Using word overlap."