Closed Sri115 closed 1 month ago
CPU goes burrrr - https://pasteboard.co/CTJ1iAREUffs.png
There may be something else that is pushing CPU so hard. It is likely to be the image thumbnail extraction, sharp
, that is using such power. It does so as well on my machine. However, I am not very certain about that. Can I see the screenshot of a task manager, e.g. top
(the hard-core one), htop
(the less fancy one), or btop
(the fancy one) inside the container?
GPU transcoding - https://pasteboard.co/IsbIjF0xWPbZ.png
Regarding machine-learning not using the GPU, can I have the log file in /var/logs/immich/ml.log
? It would be very helpful. Based on my uneducated guess, you may have missed the step at immich config -- this step tells the immich web server where to find the machine-learning backend.
Hi thanks for your quick followup
1) Here are some pictures from top, btop https://pasteboard.co/47Lc7bQDPXIJ.png https://pasteboard.co/DeeHGx7Kf9rZ.png
2) I am sure I setup the machine learning config as suggested. https://pasteboard.co/11b3RkVth9x0.png https://pasteboard.co/MF0GUtZiL3Vh.png
I am also attaching the ml.log here
Going through the log I see the error Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9. So I guess nvidia-cudnn was not correctly installed ? But installing again shows it is not an issue.
You are welcome. :)
Here are some pictures from top, btop
You are right about the machine learning. It is eating all of the CPUs.
I am sure I setup the machine learning config as suggested.
Your config is very correct.
I am also attaching the ml.log here
There are two type of errors in the log.
[ERROR] Can't connect to ('127.0.0.1', 3003)
[E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory
The first one does not matter since it disappeared after some time, which is probably because of startup sequence.
The second one is complaining about cannot find the cudnn
components. For a Ubuntu
machine, the dynamic library, i.e. these components, is at /usr/lib/x86_64-linux-gnu/
. Based on the last screenshot, your machine has nvidia-cudnn 8.2.4.15
installed. As a result, the components available in lib folder would be something ends with ".8", while the immich machine-learning server would like to have a different version.
The cause of this problem is likely to be the distro package manager ships an old nvidia-cudnn
, while the nvidia driver or immich or the dependency of immich, e.g. onnx runtime requires a later one.
To address this issue, I recommend uninstalling the nvidia-cudnn
in the distro package manager, and install the latest one from NVIDIA's official website. (As the time of writing, it is 9.3.0) This should fix the issue. 😃
ok thanks for your inputs. Since this ticket is turning out to be more of an infrastructure problem on my side I think it can be closed. I will try out your suggestion and provide feedback over the weekend.
Also as a side question, jellyfin has just announced v1.112.1 . Do you plan to test and keep your repo up-to-date with every update from jellyfin as well ? That would be massive effort from your part but people like me would appreciate it.
Also as a side question, jellyfin has just announced v1.112.1 . Do you plan to test and keep your repo up-to-date with every update from jellyfin as well ? That would be massive effort from your part but people like me would appreciate it.
To be honest, I cannot promise a lot. However, I still wants to keep this project alive as long as possible, or become part of Immich one day. 😺
ok thanks for your inputs. Since this ticket is turning out to be more of an infrastructure problem on my side I think it can be closed. I will try out your suggestion and provide feedback over the weekend.
You are welcome. 😃 I always have issues with NVIDIA things as well, no worries.
Hi I am not sure if that is how it is supposed to work but the facial recognition job does not use the GPU rather the CPU in my case. Though the GPU is used for transcoding, the CPU runs at the maximum as soon as I upload images.
CPU goes burrrr - https://pasteboard.co/CTJ1iAREUffs.png GPU transcoding - https://pasteboard.co/IsbIjF0xWPbZ.png
I am not sure what I am missing here