Closed Chris2000SP closed 9 months ago
This setting controls the number of CPU threads used for non-GPU operations, so I think it's really odd that it crashes your display server. Did you check the log of Xorg or Wayland after it crashes (Ctrl-Alt-F2 to another tty first) to see what happened - or does the whole GPU driver crash (no response from the display at all)?
@cebtenzzre I had a similar crash that caused GDM (Gnome) to restart. I was switching back and forth between Instruct and OpenOrca. I had radeontop opened and I overflowed the GTT (~16GB) before overflowing the VRAM (8GB).
The CPU Fan of my Noctua Cooler get to 100% and the Display Freezes. I definitively have to sysrq or Hardreset the PC. The Kernel do not Panic and i had running VoIP Session (Mumble) working at that freeze. I could Try to SSH me to my PC from laptop if i had sshd running what i didn't. I didn't checked the log files though. EDIT: @danisztls I think the VRAM filling has nothing to do with the crash if you mean switching the Models in gpt4all. I tried that. It really is the Threads problem on GPU for the crash for me. Please reevaluate that.
OK, i checked the journal:
an 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Jan 21 23:28:43 pc kernel: [drm:amdgpu_cs_parser_bos.isra.0 [amdgpu]] *ERROR* amdgpu_vm_validate_pt_bos() failed.
After 2 minutes later journal ends with a lot of killing processes and this:
Jan 21 23:28:53 pc kernel: Out of memory: Killed process 1496 (plasmashell) total-vm:9754260kB, anon-rss:276404kB, file-rss:476kB, shmem-rss:36kB, UID:1000 pgtables:2696kB oom_score_adj:200
EDIT:
I did found this:
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): qrc:/gpt4all/main.qml:860: TypeError: Passing incompatible arguments to C++ functions from JavaScript is not allowed.
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "Could not convert argument 0 at"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): "expression for globalPoint@qrc:/gpt4all/main.qml:860"
Jan 21 23:27:21 pc plasmashell[15379]: [Warning] (Sun Jan 21 23:27:21 2024): qrc:/gpt4all/main.qml:860: TypeError: Passing incompatible arguments to C++ functions from JavaScript is not allowed.
This really isn't a GPT4All bug - you are running out of either system RAM or GPU VRAM. Try a smaller model.
Linux does tend to freeze when it runs out of system RAM instead of killing the process, as it has pathological swapping behavior in some cases. This is more of a kernel bug than an app bug. I use some kernel patches similar to this to prevent this from happening with other programs: https://github.com/hakavlad/le9-patch
@danisztls I think the VRAM filling has nothing to do with the crash if you mean switching the Models in gpt4all. I tried that. It really is the Threads problem on GPU for the crash for me. Please reevaluate that.
It was the GTT. Not RAM or VRAM but rather an allocated area in the RAM for the GPU to use.
This really isn't a GPT4All bug - you are running out of either system RAM or GPU VRAM. Try a smaller model.
8GB VRAM is supposed to handle a 3.8GB model. The problem might be unwanted threading.
I had a similar crash that caused GDM (Gnome) to restart. I was switching back and forth between Instruct and OpenOrca. I had radeontop opened and I overflowed the GTT (~16GB) before overflowing the VRAM (8GB).
This is #1840, unless you can also get it to happen by only changing the number of CPU threads without switching models.
System Info
Arch Linux AMD Ryzen 5800x3d AMD Radeon RX 6800 XT GPT4all 2.6.1 from AUR "aur/gpt4all-chat"
vulkaninfo.txt
Information
Reproduction
If using Defaults of 4 Threads on Device Auto (or more than 2 Threads) it crashes the Display-Server. Have to sysrq me out.
Expected behavior
No crash of Display-Server and APP and put an Error for this setting.
Note: I had the Same problem with rocm and leela-zero. rocm has fixed it and put an error msg. What i know is GPT4all uses Vulkan.