lmstudio-ai / .github

35 stars 3 forks source link

Server stops responding after one call to chat #59

Open tfius opened 1 month ago

tfius commented 1 month ago

Servers stops responding after one API call, and chat starts streaming <unused31>.... or on gemma replies only with GGGGGGGGGs

yagil commented 1 month ago

Hi @tfius can you please share the following details?

  1. the exact hugging face link for the model you’re running, including which file specifically
  2. What is your operating system?
  3. Any GPU?
tfius commented 1 month ago

Hi, after playing a bit with settings, i noticed that if I upload to GPU, any model i load does more or less the same. This happens on Win11, cards RTX4000 and A100. Mind that it does not happen when only CPU is used. Happens also on CUDA and OpenCL backend type. Could it be a problem with llama.cpp ? (started to happen when updated to v .27)