lmstudio-ai / lmstudio-bug-tracker

Bug tracking for the LM Studio desktop application
4 stars 3 forks source link

LM Studio becomes unresponsive when fed with larger datasets #58

Open Darkwing371 opened 1 month ago

Darkwing371 commented 1 month ago

Version: 0.2.27 Platform: Windows, Linux, Mac

I used the built-in server to feed LM Studio with a text file via the API: ≈300kB, ≈60.000 words. At the beginning it worked quite well, but I noticed that as soon as roughly 50.000 tokens were read in, the application window froze, turned white and did not display anything any more. Screeshot:

lmstudio-window-not-responding

However, LM Studio still seems to continue calculation.

lmstudio-still-calculating

But I have no way of knowing what it is doing and what output it produces. I also have no way of knowing, how long the calculation would take. (I'm waiting for five hours now.) Maybe it is doing this, as stated in https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/50.

I tested this on Linux (Ubuntu 24), on Windows 10 and it seems that it is also the case on Mac, because this report in https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/22 seems to be the same issue.

So when feeding via the server, it stalls and when using the chat window it stalls; presumably on all platforms.

Some advice is much appreciated ... but I leave this here as a bug report.

Darkwing371 commented 1 month ago

I explored this problem further.

Quick remedy: When dealing with large texts, select a model that supports a high context size, like Mistral or Qwen (currently 32k tokens). Set the context size to the highest possible value:

context-size

Also make sure you divide your desired text into chunks of a size roughly below this context size. Then feed these chunks step by step to the model. I can't give you the numbers, you have to experiment. But chunks with a word count of 75% of the set context size ( = the amount of tokens) are a good starting point.

What seems to be a bug here: It is possible the overfeed a model with more tokens than the set context size. It works for a while, but at some point it snaps and the model reads in (or generates?) 10.000s of tokens in a row ... until it crashes.

There should be some kind of check beforehand, if the text size of the input will exceed the context size. And then maybe divide the input in chunks automatically, before feeding it to the model.