jncraton / languagemodels

Explore large language models in 512MB of RAM
https://jncraton.github.io/languagemodels/
MIT License
1.18k stars 78 forks source link

Killed #31

Closed mycomedico closed 8 months ago

mycomedico commented 8 months ago

i am running the lm.extract('prompt', document) function and it seems to sometimes get hung up and then python will exit and return to bash terminal i have the single word message - Killed

I typed sudo dmesg and saw this message in my log:

[300974.215769] Out of memory: Killed process 336943 (python) total-vm:20061840kB, anon-rss:13408616kB, file-rss:128kB, shmem-rss:0kB, UID:1000 pgtables:27924kB oom_score_adj:0

mycomedico commented 8 months ago

I increased max_ram to 14gb and that seems to have solved it

jncraton commented 8 months ago

I'm glad that you were able to get this resolved. It's possible to run out of memory during inference when using very long prompts, as memory requirements increase with prompt length. This currently results in your OS handling the out-of-memory condition as it normally would (often by killing the process hogging memory). I may explore adjusting this behavior in the future to either avoid this condition by limiting prompt length or by catching the OOM and providing a more helpful error message.

mycomedico commented 8 months ago

I just experienced another memory error while running a relatively short prompt and context.

[358590.438640] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-e01a3d63-8e8e-4ddf-81b6-ce6499a22afc.scope,task=python,pid=410852,uid=1000 [358590.438699] Out of memory: Killed process 410852 (python) total-vm:18305836kB, anon-rss:11485836kB, file-rss:128kB, shmem-rss:0kB, UID:1000 pgtables:24332kB oom_score_adj:0 [358591.189163] systemd[1]: user@1000.service: A process of this unit has been killed by the OOM killer. [358591.807792] systemd[1]: Started crash report submission. [358591.920102] systemd[1]: whoopsie.service: Deactivated successfully. [358592.046801] systemd[1]: Started crash report submission. [358592.047334] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT [358592.047442] systemd[1]: systemd-journald.service: Failed with result 'watchdog'. [358592.049111] systemd[1]: systemd-journald.service: Consumed 35.871s CPU time. [358592.049675] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1. [358592.049785] systemd[1]: Stopped Journal Service. [358592.049809] systemd[1]: systemd-journald.service: Consumed 35.871s CPU time. [358592.053277] systemd[1]: Starting Journal Service... [358592.064364] systemd[1]: whoopsie.service: Deactivated successfully. [358592.097330] systemd-journald[422880]: File /var/log/journal/3bc42e77837046ff91ba2b6ab7df36b9/system.journal corrupted or uncleanly shut down, renaming and replacing. [358592.130826] systemd[1]: Started Journal Service. [358592.136700] systemd-journald[422880]: File /var/log/journal/3bc42e77837046ff91ba2b6ab7df36b9/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.