pulsejet / memories

Fast, modern and advanced photo management suite. Runs as a Nextcloud app.
https://memories.gallery
GNU Affero General Public License v3.0
3.08k stars 82 forks source link

NVENC stopped working after ffmpeg update. #428

Closed relink2013 closed 1 year ago

relink2013 commented 1 year ago

Im running Nextcloud 25.0.3 on Ubuntu Server 22.04 With a Quadro P400.

I finally had nvenc working perfectly in Memories until I did a system update that updated the nvidia driver from 520 to 525. After the update transcoding stopped working. I re-compiled the latest version of ffmpeg and can now use nvenc in ffmpeg again, but Memories still refuses to transcode and I'm not sure why since /tmp/go-vod.log is not being created again, I tried creating a directory just for go-vod tmp and it still isn't created.

Here is the memories section of config.php

'memories.exiftool' => '/var/www/nextcloud/apps/memories/exiftool-bin/exiftool-amd64-glibc',
'memories.ffmpeg_path' => '/ffmpeg-nvenc/ffmpeg',
'memories.ffprobe_path' => '/ffmpeg-nvenc/ffprobe',
'memories.transcoder' => '/var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64',
'memories.tmp_path' => '/go-vod',
'memories.no_transcode' => false,
'memories.qsv' => false,
'memories.nvenc' => true,

Permissions for the files in /var/www/nextcloud/apps/memories/exiftool-bin

drwxr-xr-x 6 www-data www-data 4.0K Feb 17 18:32 exiftool
-rw-r--r-- 1 www-data www-data 6.9M Feb 17 18:32 exiftool-aarch64-glibc
-rw-r--r-- 1 www-data www-data 9.2M Feb 17 18:32 exiftool-aarch64-musl
-rwxr-xr-x 1 www-data www-data 7.2M Feb 17 18:32 exiftool-amd64-glibc
-rw-r--r-- 1 www-data www-data 8.7M Feb 17 18:32 exiftool-amd64-musl
-rw-r--r-- 1 www-data www-data 4.5M Feb 17 18:32 go-vod-aarch64
-rwxr-xr-x 1 www-data www-data 4.7M Feb 17 18:32 go-vod-amd64

Permissions for /ffmpeg-nvenc

-rwxr-xr-x 1 www-data www-data 23M Feb 18 02:41 ffmpeg
-rwxr-xr-x 1 www-data www-data 23M Feb 18 02:41 ffplay
-rwxr-xr-x 1 www-data www-data 23M Feb 18 02:41 ffprobe

Error from browser console

videojs:"ERROR:""(CODE:4 MEDIA_ERR_SRC_NOT_SUPPORTED)""The media could not be loaded, either because the server or network failed or because the format is not supported."

I should note that I also have the recognize app running in GPU mode and it's still working just fine. I can verify it's utilizing the GPU using nvidia-smi and nvtop.

Edit: I almost forgot, apparently some defaults were changed in the latest build of ffmpeg and I now have to specify -b_ref_mode 0 in the ffmpeg command, otherwise I get the error B frames as references are not supported. That may have something to do with the issue. Is there anyway I can customize the command that is being sent to ffmpeg?

relink2013 commented 1 year ago

I created a separate directory for tmp and reinstalled memories and am finally getting a log file generated as well as seeing ffmpeg processes in nvtop. However it's still failing, and the log file is filled with errors.

Full log here.

relink2013 commented 1 year ago

I feel like I'm just chasing my tail at this point and missing something obvious.

Sometimes a log gets created, sometimes it doesn't. The log that was created was named tmp.log instead of go-vod.log and I have no idea why. Initially go-vod and ffmpeg weren't being launched at all, then all of the sudden they were which is how I got the logs in the previous post, and now they aren't being launched again.

I decided to try removing memories and re-installing, checked all the permissions, and initially the processes launched, and now they wont again. I have tried rebooting and still nothing, changed tmp directories and still nothing. At this point I even disabled nvenc to see if CPU transcoding at least worked and still got nothing. I have literally been troubleshooting this for almost 2 full days at this point and Im not sure what else to try...

pulsejet commented 1 year ago

A couple of things to note/try:

  1. The path of the log files changed to /tmp/go-vod/<instance-id>.log in one of the recent updates, unless you set a temp path in the config (I'd suggest reverting this if you did, otherwise you'll have more issues).
  2. You may want to try deleting /tmp/go-vod just in case it has wrong permissions (again, just reset any config.php option for the temp dir.
  3. The transcoding issue could be related to the NVIDIA driver. I see some OOM there; maybe try running these commands directly to see if it works?
relink2013 commented 1 year ago

Ok I tried starting as fresh as possible but still getting Transcoding Failed and no log is being created.

  1. Disabled and then removed Memories
  2. Removed any memories. lines from config.php
  3. Re-Installed Memories from Apps
  4. Ran occ memories:video-setup, Yes to enable transcoding, No to VAAPI.
  5. Verified that the memories. lines were re-created in config.php
  6. Verified the path to ffmpeg was correct /usr/local/bin/ffmpeg
  7. Added 'memories.nvenc' => true, to config.php

I also verified the permissions for;

/var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64

-rwxr-xr-x 1 www-data www-data 4.7M Feb 21 02:05 go-vod-amd64

/tmp/go-vod

drwxr-xr-x 2 www-data www-data  4.0K Feb 21 02:20 go-vod

Of course I made sure to run pkill go-vod after setting everything up, as well as restarting NGINX and PHP. I have also tried rebooting the entire sever and clearing the browser cache for my domain.

Edit: To clarify the /tmp/go-vod directory was only created because I manually started go-vod-amd64 from the cli using sudo -u www-data /var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64 After rebooting the /tmp/go-vod folder is gone, and it has not been re-created since.

shperrung commented 1 year ago

Hi! Yesterday I created docker image with apache-nextcloud:25.0.3 and ffmpeg v 5.1.2 built with CUDA support. Expectedly I got "Transcoding Failed". /tmp/go-vod is absent inside container. In addition this appears during video playback Failed to get Exif data. Metadata may be lost!

My host: Debian 11 Image in container also Debian 11 ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1) configuration: --disable-debug --disable-doc --disable-ffplay --enable-cuda --enable-cuvid --enable-fontconfig --enable-gpl --enable-libaom --enable-libaribb24 --enable-libass --enable-libbluray --enable-libfdk_aac --enable-libfreetype --enable-libkvazaar --enable-libmp3lame --enable-libnpp --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libsrt --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxvid --enable-libzmq --enable-nonfree --enable-nvenc --enable-openssl --enable-postproc --enable-shared --enable-small --enable-version3 --extra-cflags='-I/opt/ffmpeg/include -I/opt/ffmpeg/include/ffnvcodec -I/usr/local/cuda/include/' --extra-ldflags='-L/opt/ffmpeg/lib -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib32/' --extra-libs=-ldl --extra-libs=-lpthread --prefix=/opt/ffmpeg libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavdevice 59. 7.100 / 59. 7.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100

Whole path /var/www/html/custom_apps/memories/exiftool-bin/ is owned by www-data:www-data go-vod-amd64, exiftool-amd64-glibc and exiftool-amd64-musl are executable and owned by www-data:www-data.

config.php has the same right paths to go-vod-amd64, exiftool-amd64-glibc

VOD server starts manually with that command: root@omv6:/mnt/nextcloud/nc/custom_apps/memories/exiftool-bin# sudo docker exec -it -u www-data nextcloud-app-1 ./custom_apps/memories/exiftool-bin/go-vod-amd64 2023/02/22 11:10:30 Starting VOD server

relink2013 commented 1 year ago

@shperrung I really wish I knew what version of ffmpeg I had working. I compiled it about a month ago from master, and when I recently re-compiled I thought nothing of deleting the old one.

I have tried figuring it out from the VideoLAN git repo and GitHub but the version numbers are all over the place so I’m not sure what version it actually would have been.

Edit: ok so I learned apparently the master branch, which is considered stable by the devs, are actually nightly builds. I’m going to try just going back a few weeks and see what happens

shperrung commented 1 year ago

@relink2013 I just tested: ffmpeg 4.2.8 released 4 months ago ffmpeg 4.2.7 released 9 months ago I did hardlinks to /usr/bin/ffmpeg from /usr/local/bin/ffmpeg but there is the same "Transcoding Failed".

darkobas commented 1 year ago

I have the same issue. stopped working and starting after update.

how do u run in that it works ? sudo -u www-data /var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64 creates the tmp folder but thats it, no log or no transcoding

relink2013 commented 1 year ago

I have the same issue. stopped working and starting after update.

how do u run in that it works ? sudo -u www-data /var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64 creates the tmp folder but thats it, no log or no transcoding

This is currently exactly where I'm at too.

shperrung commented 1 year ago

@relink2013 @darkobas Digging logs in ./tmp folder I found that: 2023/02/23 03:51:51 yl1c1gk9wc90-720p: /usr/local/bin/ffmpeg -loglevel warning -hwaccel cuda -hwaccel_output_format cuda -autorotate 0 -i /mnt/raid/photo/photostore/2022/12/20221214_142420_46F5E493.mp4 -copyts -vf format=nv12|cuda,hwupload,scale_cuda=w=1280:h=720:force_original_aspect_ratio=decrease:passthrough=0 -maxrate 4326666 -bufsize 8653332 -c:v h264_nvenc -profile:v high -preset p6 -tune ll -temporal-aq 1 -rc vbr -rc-lookahead 30 -cq 24 -c:a aac -ac 1 -b:a 192k -avoid_negative_ts disabled -f hls -hls_time 3 -force_key_frames expr:gte(t,n_forced*3) -hls_segment_type mpegts -start_number 0 -hls_segment_filename /var/www/html/tmp/yl1c1gk9wc90-1001461337/720p-%06d.ts - 2023/02/23 03:51:51 ffmpeg-error: [AVFilterGraph @ 0x561479119cc0] No such filter: 'scale_cuda' 2023/02/23 03:51:51 ffmpeg-error: Error reinitializing filters! 2023/02/23 03:51:51 ffmpeg-error: Failed to inject frame into filter network: Invalid argument 2023/02/23 03:51:51 ffmpeg-error: Error while processing the decoded data for stream #0:0 2023/02/23 03:51:51 ffmpeg-error: [aac @ 0x5614779f6d40] 2 frames left in the queue on closing 2023/02/23 03:51:55 yl1c1gk9wc90: new manager for /mnt/raid/photo/photostore/2018/08/20180821_073712_2128E9C1.mp4 2023/02/23 03:51:55 yl1c1gk9wc90: destroying manager 2023/02/23 03:51:55 yl1c1gk9wc90-max: stopping stream 2023/02/23 03:51:55 yl1c1gk9wc90-360p: stopping stream 2023/02/23 03:51:55 yl1c1gk9wc90-480p: stopping stream 2023/02/23 03:51:55 yl1c1gk9wc90-720p: stopping stream 2023/02/23 03:51:55 yl1c1gk9wc90-720p: stopping stream The problem is in ffmpeg. Something wrong with 2023/02/23 03:51:51 ffmpeg-error: [AVFilterGraph @ 0x561479119cc0] No such filter: 'scale_cuda'

Here is some information about difference in scale_cuda and scale_npp. I'll try to rebuild ffmpeg in image with --enable-cuda-llvm

https://www.reddit.com/r/ffmpeg/comments/euiwtv/comment/fjwojvb/

shperrung commented 1 year ago

@darkobas

how do u run in that it works ? sudo -u www-data /var/www/nextcloud/apps/memories/exiftool-bin/go-vod-amd64 creates the tmp folder but thats it, no log or no transcoding

I created another tmp folder here (inside container!): /var/www/html/tmp and indicated it in config.php. Then I restarted nextcloud stack and played a video. Of course, I got "Transcoding Failed", but it created log file /var/www/html/tmp.log with details of attempts to use nvidia

shperrung commented 1 year ago

@darkobas @relink2013 Possibly I found problem and solution. Missing logs and freezed go-vod process may be caused by error in first attempt to play transcoded video as i wrote above. Error in ffmpeg shows missing scale_cudafilter. Another filter scale_nppcan do the same scaling function but "memories" does not use it in syntax for ffmpeg. So I thought, that we just need to add scale_cuda filter into ffmpeg. --enable-cuda-sdk does it. see my log and try build your own image with ffmpeg. I'm using nextcloud:25.0.3-apache but you can change it in row#20 of Dockerfile. Copy Dockerfile (remove .txt) into any folder and start there docker build -t ncff2503 . All required dependicies are compilling from sources before ffmpeg build. Today I faced with problems when builder tried download sources from https://download.videolan.org/pub/videolan/... I changed addresess to available mirrors and a build completed successfully. Try it. Hope it helps. oct00g34j6d9.log Dockerfile.txt

relink2013 commented 1 year ago

I've been following along with @shperrung and continuing to experiment on my own and I have still not gotten any further.

I do have a suggestion though, @pulsejet why not just take an existing container such as jrottenberg/ffmpeg:4.2-nvidia, slap go-vod in it and release it as an "official companion container" and allow users to just specify the container IP:PORT in config.php or Settings in the GUI. This way we can hopefully avoid these issues in the future, users don't have to compile ffmpeg themselves, and users who run on Docker, or low power hardware like a RaspberryPi can leverage a NVIDIA GPU on another system without needing to jump through hoops.

There is actually a similar solution being discussed over in the Recognize Apps GitHub to take all the NVIDIA stuff and just put it in a separate docker container. I personally think this would be the best route for both Memories and Recognize since the official NC devs have zero interest in including the nvidia drivers or libs anyway.

shperrung commented 1 year ago

@relink2013 Since Nextcloud positioning itself as "office software", they will not do solution for expanding "media" capabilities because Jellyfin Plex Ombi Serviio and others do it better. Unfortunately, jrottenberg/ffmpeg:4.2-nvidia container doesn't have scale_cuda by default and it is built based on official Nvidia image nvidia/cuda:11.4.1-devel-ubuntu20.04. It doesn't have demons and services allows to use it as part of docker-compose. Another "exotic" variant for connecting jrottenberg/ffmpeg:4.2-nvidia and nextcloud:25.0.3-apache is to expose docker socket /var/run/docker.sock into container. All guides warn that it is not safe and cannot be used for production but only for development. I never met ffmpeg with nvidia support available in package repositories. All times I needed to build it and install dependencies manually. The most easiest solution is to add into Memories scale_npp syntax in addition to scale_cuda: ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_cuda=1280:720 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4 ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_npp=1280:720 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4 as specified there. It will work with the most of pre-built and "standard" builds of ffmpeg(nvidia/cuda).

shperrung commented 1 year ago

@pulsejet Hi! Is it possible to add special procedure for nvidia, when initially Memories checks a presence filters scale_npp and scale_cuda ffmpeg -filters | grep scale? In depend of absence for example "scale_cuda", use alternative syntax with "scale_npp". scale_npp needs npp libraries from cuda toolkit to be installed in machine/image, while scale_cuda not. Therefore it would be better to have ffmpeg exec commands for both nvidia scalers because we haven't "standard ffmpeg-nvidia" package with "always available" features like VAAPI. Each "builder" makes it by own way and often got errors due to absent specific filters or codecs. Upd 27Feb: I reinstalled nextcloud from backup and rebuild docker image with built ffmpeg compilled --enable-cuda-sdk (scale_cuda) and --enable-libnpp (scale_npp). HW transcoding on scale_cuda confirmed in log. No problem with start go-vod process. Thank a lot for this awesome and long-awaited app for nextcloud.

pulsejet commented 1 year ago

I just want to add a note here: I'd really like to fix all these issues with GPU decoding; unfortunately I don't have a GPU so there's no way I can test anything. If you find/fix any bugs please do open a PR, or drop a note to add to the wiki.

The reason I have refrained from needing a second container for decoding stuff is because not everyone runs docker, so this cannot be a requirement (it has to be an "option", i.e. more maintenance overhead). If someone is willing to maintain the transcoding parts, that will be great.

relink2013 commented 1 year ago

I've been at this for a week now and I just cant seem to get anywhere, no logs, nothing...

If someone is willing to maintain the transcoding parts, that will be great.

I have no idea if I have the skill to accomplish it or not, but I've been very tempted to at least try recently. I've been wondering if there is a way to modify the command that is sent to ffmpeg? I've been rolling around ideas to have ffmpeg running on a different system, but I'm really not sure where to start on something like that.

pulsejet commented 1 year ago

The way it works is Memories runs the go-vod daemon, which does all the transcoding. Whenever the frontend requests (part of) a file, it just proxies the request (after access control checking) to the daemon, which starts and manages the ffmpeg processes. Some pointers:

  1. How go-vod is started: https://github.com/pulsejet/memories/blob/4cdb575d6330207b0f9ac151ba52f7b1de1028ea/lib/Controller/VideoController.php#L226
  2. Requests proxied to go-vod here: https://github.com/pulsejet/memories/blob/4cdb575d6330207b0f9ac151ba52f7b1de1028ea/lib/Controller/VideoController.php#L320
  3. go-vod repository where the ffmpeg arguments are constructed: https://github.com/pulsejet/go-vod/blob/35b4b3a8b2c21ed032114e990a03db460f5b5e15/stream.go#L360
darkobas2 commented 1 year ago

What version of nextcloud are tou guys on. For the love of god i cant get go-vod to start by it self.. and when i start it myself i still get 403 in browser

pulsejet commented 1 year ago

What version of nextcloud are tou guys on. For the love of god i cant get go-vod to start by it self.. and when i start it myself i still get 403 in browser

The Nextcloud version shouldn't matter (24+). You should try checking the logs to see what is happening. Is go-vod receiving any requests (anything in the logs when you manually run it?). Also please do open a separate issue for this since it isn't related to NVENC.

darkobas2 commented 1 year ago

Which logs ? No log gets created as apache reports 403. So the log folder is empty. And yes i run it as user www-data

pulsejet commented 1 year ago

@darkobas2 please file a separate issue as indicated earlier. Do also include any information you see in the JS console and the network tab (for that request), as well as anything that might have shown up in the Nextcloud logs.

shperrung commented 1 year ago

Which logs ? No log gets created as apache reports 403. So the log folder is empty. And yes i run it as user www-data

@darkobas2, You run it right as www-data. It has to write log into /tmp/go-vod but just in case of successful transcoding. Try to remove in config.php this row 'memories.nvenc' => true, then restart nextcloud docker-compose stack and try to play video. In that case you will get logs in /tmp/go-vod. In that log you will see information about software transcoding with libx264 codec and crop filter. Missing logs cased by error in ffmpeg, on my opinion. As I wrote above, the main problem with Nvidia HW transcode is in use of the one resizer scale_cuda, while our ffmpeg is built with another scale_npp resize filter. I don't mind that ask @pulsejet to change filter is good idea. Root cause is in Nvidia decission to maintain both resizers but not because of inability to handle them in products. Here @pulsejet gave us pionts where we can improve something. Especially look there:

3. go-vod repository where the ffmpeg arguments are constructed: https://github.com/pulsejet/go-vod/blob/35b4b3a8b2c21ed032114e990a03db460f5b5e15/stream.go#L360

Here you can just replace scale_cuda to scale_npp and you will get HW transcode on Nvidia and logs about this.

If you don't want or can't fix it, use my method and just improve ffmpeg in you image. In case of docker installation you anyway must build your own image because there are no "standard" Nextcloud image with pre-built ffmpeg (nvidia-cuda). I heard about Nextcloud - AIO but latest release is built with ffmpeg free from cuda-nvidia stuff. Dockerfile.1.txt

How to build image 1) Download Dockerfile.txt. Remove ".1.txt" and save it at any folder as just "Dockerfile". 2) Correct the row#20 with the name of base Nextcloud image you use. I'm using now nextcloud:25.0.3-apache. You can change it to nextcloud:25.0.3-fpm, nextcloud:25.0.3-alpine... It doesn't matter. 3) run command in terminal sudo docker build -t ncff2503 . from the same folder where Dockerfile is and watch the process 20-35 min. 4) Docker gets "standard" image nextcloud:25.0.3-apache and just copies there libraries and binaries related to ffmpeg from another special image nvidia/cuda:11.4.1-devel-ubuntu20.04 where it builds ffmpeg. 5) As result you will have new image in your system named as "ncff2503". Recreate your Nextcloud stack with name of your new own image.

There is the one peculiarity with that image: it depends of base "standard" image "nextcloud:25.0.3-apache" and you can't delete it. In that case you can export you own image ncff2503 then delete all images created during that building and then import ncff2503 back to host system. Run docker stack again. Export/Import I do in Portainer.

pulsejet commented 1 year ago

Here you can just replace scale_cuda to scale_npp and you will get HW transcode on Nvidia and logs about this.

I'm fine with this change. It can be a configurable option easily. Even better if we can detect it #450