immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
49.94k stars 2.65k forks source link

Encoding fails with opus audio and hardware acceleration enabled #9507

Closed watn3y closed 5 months ago

watn3y commented 5 months ago

The bug

When encoding video with QuickSync or VAAPI (not sure about the others) we set the CRF with the -global_quality ffmpeg option. Since opus does not support quality-based encoding it fails with this error message:

[libopus @ 0x5b636213580] No bit rate set. Defaulting to 96000 bps.
[libopus @ 0x5b636213580] Quality-based encoding not supported, please specify a bitrate and VBR setting.
[aost#0:1/libopus @ 0x5b6361df480] Error initializing output stream: Error while opening encoder for output stream #0:1 - maybe incorrect parameters such as bit_rate, rate, width or height

Instead of -global_quality we should be using -global_quality:v. This way, we only set the CRF for the video stream.

Even though this works for aac, when using high CRF values (like the recommended 35 for AV1) we end up with quite low quality audio files (in my case the bitrate was around 60kbit/s which isn't great).

I haven't had a chance to look at the code yet but it would probably be a good idea to specify video and audio quality seperately for all encoders.

The OS that Immich Server is running on

Docker 26.1.2 on Debian Bookworm

Version of Immich Server

v1.105.1

Version of Immich Mobile App

not relevant

Platform with the issue

Your docker-compose.yml content

not relevant

Your .env content

not relevant

Reproduction steps

1. Enable opus audio
2. Encoding breaks due to incorrect parameters

Relevant log output

[Nest] 7  - 05/15/2024, 8:28:12 AM     LOG [ImmichMicroservices] [MediaService] Started encoding video 9af05d7c-6cd4-4fbf-8dfe-d5f86ddbe1c8 {"inputOptions":["-init_hw_device qsv=hw","-filter_hw_device hw"],"outputOptions":["-c:v av1_qsv","-c:a libopus","-movflags faststart","-fps_mode passthrough","-map 0:2","-map 0:1","-bf 7","-refs 5","-g 256","-v verbose","-vf format=nv12,hwupload=extra_hw_frames=64","-preset 3","-global_quality 35"],"twoPass":false}
[Nest] 7  - 05/15/2024, 8:28:12 AM   ERROR [ImmichMicroservices] [MediaRepository]   libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[AVHWDeviceContext @ 0x5b636180300] Trying to use DRM render node for device 0, with matching kernel driver (i915).
[AVHWDeviceContext @ 0x5b636180300] libva: VA-API version 1.21.0
[AVHWDeviceContext @ 0x5b636180300] libva: User requested driver 'iHD'
[AVHWDeviceContext @ 0x5b636180300] libva: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
[AVHWDeviceContext @ 0x5b636180300] libva: Found init function __vaDriverInit_1_21
[AVHWDeviceContext @ 0x5b636180300] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x5b636180300] Initialised VAAPI connection: version 1.21
[AVHWDeviceContext @ 0x5b636180300] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 24.1.5 (8068c2e).
[AVHWDeviceContext @ 0x5b636180300] Driver not found in known nonstandard list, using standard behaviour.
[AVHWDeviceContext @ 0x5b636180200] Use Intel(R) oneVPL to create MFX session, API version is 2.10, the required implementation version is 1.3
libva info: VA-API version 1.21.0
libva info: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_21
libva info: va_openDriver() returns 0
libva info: VA-API version 1.21.0
libva info: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_21
libva info: va_openDriver() returns 0
[AVHWDeviceContext @ 0x5b636180200] Initialize MFX session: implementation version is 2.10
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'upload/library/watn3y/2023/02/26/PXL_20230226_122726710.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 131072
    compatible_brands: isomiso2mp41
    creation_time   : 2023-02-26T12:27:31.000000Z
    location        : +52.3735+9.7302/
    location-eng    : +52.3735+9.7302/
    com.android.capture.fps: 120.000000
  Duration: 00:00:13.40, start: 0.000000, bitrate: 8876 kb/s
  Stream #0:0[0x1](eng): Data: none (mett / 0x7474656D), 43 kb/s (default)
    Metadata:
      creation_time   : 2023-02-26T12:27:31.000000Z
      handler_name    : MetaHandle
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 191 kb/s (default)
    Metadata:
      creation_time   : 2023-02-26T12:27:31.000000Z
      handler_name    : SoundHandle
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Video: hevc (Main), 1 reference frame (hvc1 / 0x31637668), yuvj420p(pc, bt709, left), 1920x1080 (1920x1088), 8637 kb/s, SAR 1:1 DAR 16:9, 30.08 fps, 30 tbr, 90k tbn (default)
    Metadata:
      creation_time   : 2023-02-26T12:27:31.000000Z
      handler_name    : VideoHandle
      vendor_id       : [0][0][0][0]
    Side data:
      displaymatrix: rotation of -90.00 degrees
Stream mapping:
  Stream #0:2 -> #0:0 (hevc (native) -> av1 (av1_qsv))
  Stream #0:1 -> #0:1 (aac (native) -> opus (libopus))
Press [q] to stop, [?] for help
[graph 0 input from stream 0:2 @ 0x5b6361f69c0] w:1920 h:1080 pixfmt:yuvj420p tb:1/90000 fr:30/1 sar:1/1
[auto_scale_0 @ 0x5b6361f6d80] w:iw h:ih flags:'' interl:0
[transpose @ 0x5b6361f6a80] auto-inserting filter 'auto_scale_0' between the filter 'graph 0 input from stream 0:2' and the filter 'transpose'
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[transpose @ 0x5b6361f6a80] w:1920 h:1080 dir:1 -> w:1080 h:1920 rotation:clockwise vflip:0
[AVHWDeviceContext @ 0x5b6361869c0] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 24.1.5 (8068c2e).
[AVHWDeviceContext @ 0x5b6361869c0] Driver not found in known nonstandard list, using standard behaviour.
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[AVHWFramesContext @ 0x5b636127e00] Use Intel(R) oneVPL to create MFX session, API version is 2.10, the required implementation version is 2.10
[AVHWFramesContext @ 0x5b636127e00] Initialize MFX session: implementation version is 2.10
[av1_qsv @ 0x5b636211180] Using input frames context (format qsv) with av1_qsv encoder.
[av1_qsv @ 0x5b636211180] Encoder: input is video memory surface
[av1_qsv @ 0x5b636211180] Use Intel(R) oneVPL to create MFX session with the specified MFX loader
[av1_qsv @ 0x5b636211180] Using the intelligent constant quality (ICQ) ratecontrol method
[av1_qsv @ 0x5b636211180] profile: av1 main; level: 40
[av1_qsv @ 0x5b636211180] GopPicSize: 256; GopRefDist: 8; GopOptFlag:; IdrInterval: 0
[av1_qsv @ 0x5b636211180] TargetUsage: 3; RateControlMethod: ICQ
[av1_qsv @ 0x5b636211180] ICQQuality: 35
[av1_qsv @ 0x5b636211180] NumRefFrame: 5
[av1_qsv @ 0x5b636211180] IntRefType: 0; IntRefCycleSize: 0; IntRefQPDelta: 0; IntRefCycleDist: 0
[av1_qsv @ 0x5b636211180] MaxFrameSize: 0;
[av1_qsv @ 0x5b636211180] BitrateLimit: unknown; MBBRC: OFF; ExtBRC: unknown
[av1_qsv @ 0x5b636211180] VDENC: ON
[av1_qsv @ 0x5b636211180] BRefType: pyramid
[av1_qsv @ 0x5b636211180] PRefType: default
[av1_qsv @ 0x5b636211180] MinQPI: 1; MaxQPI: 255; MinQPP: 1; MaxQPP: 255; MinQPB: 1; MaxQPB: 255
[av1_qsv @ 0x5b636211180] FrameRateExtD: 1; FrameRateExtN: 30 
[av1_qsv @ 0x5b636211180] NumTileRows: 1; NumTileColumns: 1; NumTileGroups: 1
[av1_qsv @ 0x5b636211180] WriteIVFHeaders: OFF 
[av1_qsv @ 0x5b636211180] LowDelayBRC: OFF
[av1_qsv @ 0x5b636211180] MaxFrameSize: 0;
[graph_1_in_0_1 @ 0x5b6361f2580] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:stereo
[format_out_0_1 @ 0x5b6360c4e80] auto-inserting filter 'auto_aresample_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[auto_aresample_0 @ 0x5b6360c5c80] ch:2 chl:stereo fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:flt r:48000Hz
[libopus @ 0x5b636213580] No bit rate set. Defaulting to 96000 bps.
[libopus @ 0x5b636213580] Quality-based encoding not supported, please specify a bitrate and VBR setting.
[aost#0:1/libopus @ 0x5b6361df480] Error initializing output stream: Error while opening encoder for output stream #0:1 - maybe incorrect parameters such as bit_rate, rate, width or height
[AVIOContext @ 0x5b636207d40] Statistics: 0 bytes written, 0 seeks, 0 writeouts
Terminating demuxer thread 0
[AVIOContext @ 0x5b636206bc0] Statistics: 1292152 bytes read, 2 seeks
Conversion failed!

[Nest] 7  - 05/15/2024, 8:28:12 AM   ERROR [ImmichMicroservices] [MediaService] Error: ffmpeg exited with code 1: Conversion failed!

Additional information

No response

mertalev commented 5 months ago

Thanks for the bug report and insights! You're right that we should be setting the quality only for video.