loeeeee / immich-in-lxc

Install Immich in LXC with optional CUDA support
30 stars 3 forks source link

Debian support #6

Closed loeeeee closed 2 weeks ago

loeeeee commented 1 month ago

Currently, the README.md is mostly targeted at Ubuntu. Thus, some instruction may not be applicable to Debian. Though, no breaking issues seem to be present, more tests on Debian needs to be done to iron out some frustrating details.

makovez commented 3 weeks ago

I confirm all of those commands work in debian 12.

Just some notes (not sure if related to debian):

makovez commented 3 weeks ago

Just read it depends on https://github.com/arter97/immich-native but it says cuda is not supported. Why in your readme you say it is?

loeeeee commented 3 weeks ago

Just read it depends on https://github.com/arter97/immich-native but it says cuda is not supported. Why in your readme you say it is?

So, in the README.md, I wrote,

This guide is heavily inspired by another guide Immich Native, and the install script & service files are modified from the ones in that repo. KUDO to its author, arter97!

By saying modified from the ones in that repo, I mean I changed something in that script to make it work. You may find the changes in the script. It is just a build flag.

Also, the script does not depend on Immich Native -- using this repo does not require downloading immich-native.

I hope this answers your confusion. 😄

loeeeee commented 3 weeks ago

I confirm all of those commands work in debian 12.

Thanks a lot for testing this out.

  • The password on runtime.env "DB_PASSWORD" does not seem to be used, as it gives a password error when connecting to postgres if i change the db password.

This is weird. Please open a new issue with a brief description. DB_PASSWORD should be passed to Immich without any modification by the install or execution script. In other words, it is sort of not controlled by this script.

  • gpu does not seem to be used by immich for machine learning as my nvidia-smi shows no usage

There is another very similar issue https://github.com/loeeeee/immich-in-lxc/issues/7 reporting this. Can you check the solution in that out? 😄

makovez commented 3 weeks ago

thanks for your reply @loeeeee

I will open a new issue for that

Regarding gpu issue, does not seem to be related. I have followed all the steps at your immich config and the log does not show any error regarding any missing lib. I have installed cudnn and cuda toolkit from official nvidia site install instructions, because those packages (nvidia-cudnn libcublaslt12 libcublas12) are not in debian repo.

Here are the ml.log logs https://0x0.st/XyRp.log

loeeeee commented 3 weeks ago

thanks for your reply @loeeeee

I will open a new issue for that

Regarding gpu issue, does not seem to be related. I have followed all the steps at your immich config and the log does not show any error regarding any missing lib. I have installed cudnn and cuda toolkit from official nvidia site install instructions, because those packages (nvidia-cudnn libcublaslt12 libcublas12) are not in debian repo.

Here are the ml.log logs https://0x0.st/XyRp.log

Sorry, this is a busy week for me. I will look into this at weekend.

loeeeee commented 2 weeks ago

thanks for your reply @loeeeee

I will open a new issue for that

Regarding gpu issue, does not seem to be related. I have followed all the steps at your immich config and the log does not show any error regarding any missing lib. I have installed cudnn and cuda toolkit from official nvidia site install instructions, because those packages (nvidia-cudnn libcublaslt12 libcublas12) are not in debian repo.

Here are the ml.log logs https://0x0.st/XyRp.log

In the error.log, there is one line that caught my eyes. It says CUDA out of memory, which is a bit weird, as the default model used by Immich will use about 1100 MB of GPU memory during my test. Maybe there is some other process starving this process.

[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_113' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=immich ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size);

Other log also suggests some kinds of out of memory issue.

[E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'Conv_5' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 195084288

Just some random guess, are there any huge image files in the library? Or do you have a GPU with only very limited memory available?

I find someone with similar issue as yours in Immich github issue page.

I have installed cudnn and cuda toolkit from official nvidia site install instructions, because those packages (nvidia-cudnn libcublaslt12 libcublas12) are not in debian repo.

Good choice!

makovez commented 2 weeks ago

@loeeeee i have seen that but that's old message, when i restart it it's not repeating. My gpu has 12gb memory it's an rtx 3060. Btw I am not totally sure that it doesen't use GPU... I just guess that since when i keep printing nvidia-smi, even when reproducing video the memory usage does not change. I'm not sure if there is a better way to be sure it's being used.

root@immich:/home/immich/immich-in-lxc# nvidia-smi 
Tue Sep  3 12:05:38 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:21:00.0 Off |                  N/A |
|  0%   57C    P2             37W /  170W |    1380MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
root@immich:/home/immich/immich-in-lxc# 
loeeeee commented 2 weeks ago

@loeeeee i have seen that but that's old message, when i restart it it's not repeating. My gpu has 12gb memory it's an rtx 3060. Btw I am not totally sure that it doesen't use GPU... I just guess that since when i keep printing nvidia-smi, even when reproducing video the memory usage does not change. I'm not sure if there is a better way to be sure it's being used.

HAHAHAHA. You fell into the same pitfall as I did!!!!!! 🤣

You can see the GPU usage from the console of Proxomox host, but not inside the LXC. I have this issue as well. It is using the GPU, because otherwise, it would print something saying no process.

makovez commented 2 weeks ago

@loeeeee I just checked from host and saw 2 process running but they are definetely not related to immich because when i stop immich services they are still there. They are probably related to others containers.

root@monster:~# nvidia-smi
Tue Sep  3 14:55:43 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:21:00.0 Off |                  N/A |
|  0%   52C    P2             37W /  170W |    1380MiB /  12288MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      7946      C   /usr/local/bin/python3                        206MiB |
|    0   N/A  N/A    170797      C   python3.8                                    1166MiB |
+-----------------------------------------------------------------------------------------+
makovez commented 2 weeks ago

Do you see any process in the host related to immich ? Maybe it just doesn't show

loeeeee commented 2 weeks ago

Do you see any process in the host related to immich ? Maybe it just doesn't show

Yes, I do see the process. @makovez

Tue Sep  3 21:37:40 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A2000 12GB          On  |   00000000:81:00.0 Off |                  Off |
| 30%   51C    P2             26W /   70W |     566MiB /  12282MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   1380055      C   ...pp/machine-learning/venv/bin/python        560MiB |
+-----------------------------------------------------------------------------------------+

This is after I rerun the smart search on the Debian test machine.

Currently, I am looking into using TensorRT instead of CUDA as the ONNX runtime backend to see if that would help.

makovez commented 2 weeks ago

Ok so I just seen that when trascoding the process was not running, then it started to show up when I was using the smart search. But when transcoding i dont see changes in memory usage

loeeeee commented 2 weeks ago

Ok so I just seen that when trascoding the process was not running, then it started to show up when I was using the smart search. But when transcoding i dont see changes in memory usage

You can manually force a redo of machine-learning in Job > Smart Search > All. Then it should run for a while. However, I assume the out of memory error still exists, which will crush the program very fast. @makovez

makovez commented 2 weeks ago

wdym ? Smart search is working and i can see it uses GPU. What I am saying is that transcoding when playing video does not seem to use gpu @loeeeee

loeeeee commented 2 weeks ago

wdym ? Smart search is working and i can see it uses GPU. What I am saying is that transcoding when playing video does not seem to use gpu @loeeeee

ohhhh. I did not get you. @makovez

Transcoded video is cached, no on-the-fly transcoding happens at Immich as far as I know. In Immich, if a video is not transcoded, it does not seem to be playable.

Also, just in case you missed, the besides installing the Jellyfin ffmpeg, there is also a setting needs to be changed to enable HW-accelerated transcoding.

Additionally, for LXC with CUDA support enabled, one needs to go to Administration > Settings > Video Transcoding Settings > Hardware Acceleration > Acceleration API and select NVENC to explicitly use the GPU to do the transcoding.

makovez commented 2 weeks ago

are u sure is cached ? in /admin/jobs-status there is a process to transcode all videos for compatibility with more devices but i havent run it. Btw they just released a new version 1.112.1 lol so fast

loeeeee commented 2 weeks ago

are u sure is cached ? in /admin/jobs-status there is a process to transcode all videos for compatibility with more devices but i havent run it. Btw they just released a new version 1.112.1 lol so fast

Yea, pretty sure. Those jobs are timed and automated. No need to run them manually. It starts itself when new video is uploaded. Or one could run it manually to redo the trancoding.

I am testing out the new version. Immich devs are working really hard.

loeeeee commented 2 weeks ago

Seems like most Debian issues are sorted out, closing the issue.