When encoding video with QuickSync or VAAPI (not sure about the others) we set the CRF with the -global_quality ffmpeg option.
Since opus does not support quality-based encoding it fails with this error message:
[libopus @ 0x5b636213580] No bit rate set. Defaulting to 96000 bps.
[libopus @ 0x5b636213580] Quality-based encoding not supported, please specify a bitrate and VBR setting.
[aost#0:1/libopus @ 0x5b6361df480] Error initializing output stream: Error while opening encoder for output stream #0:1 - maybe incorrect parameters such as bit_rate, rate, width or height
Instead of -global_quality we should be using -global_quality:v. This way, we only set the CRF for the video stream.
Even though this works for aac, when using high CRF values (like the recommended 35 for AV1) we end up with quite low quality audio files (in my case the bitrate was around 60kbit/s which isn't great).
I haven't had a chance to look at the code yet but it would probably be a good idea to specify video and audio quality seperately for all encoders.
The OS that Immich Server is running on
Docker 26.1.2 on Debian Bookworm
Version of Immich Server
v1.105.1
Version of Immich Mobile App
not relevant
Platform with the issue
[X] Server
[ ] Web
[ ] Mobile
Your docker-compose.yml content
not relevant
Your .env content
not relevant
Reproduction steps
1. Enable opus audio
2. Encoding breaks due to incorrect parameters
Relevant log output
[Nest] 7 - 05/15/2024, 8:28:12 AM LOG [ImmichMicroservices] [MediaService] Started encoding video 9af05d7c-6cd4-4fbf-8dfe-d5f86ddbe1c8 {"inputOptions":["-init_hw_device qsv=hw","-filter_hw_device hw"],"outputOptions":["-c:v av1_qsv","-c:a libopus","-movflags faststart","-fps_mode passthrough","-map 0:2","-map 0:1","-bf 7","-refs 5","-g 256","-v verbose","-vf format=nv12,hwupload=extra_hw_frames=64","-preset 3","-global_quality 35"],"twoPass":false}
[Nest] 7 - 05/15/2024, 8:28:12 AM ERROR [ImmichMicroservices] [MediaRepository] libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100
[AVHWDeviceContext @ 0x5b636180300] Trying to use DRM render node for device 0, with matching kernel driver (i915).
[AVHWDeviceContext @ 0x5b636180300] libva: VA-API version 1.21.0
[AVHWDeviceContext @ 0x5b636180300] libva: User requested driver 'iHD'
[AVHWDeviceContext @ 0x5b636180300] libva: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
[AVHWDeviceContext @ 0x5b636180300] libva: Found init function __vaDriverInit_1_21
[AVHWDeviceContext @ 0x5b636180300] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x5b636180300] Initialised VAAPI connection: version 1.21
[AVHWDeviceContext @ 0x5b636180300] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 24.1.5 (8068c2e).
[AVHWDeviceContext @ 0x5b636180300] Driver not found in known nonstandard list, using standard behaviour.
[AVHWDeviceContext @ 0x5b636180200] Use Intel(R) oneVPL to create MFX session, API version is 2.10, the required implementation version is 1.3
libva info: VA-API version 1.21.0
libva info: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_21
libva info: va_openDriver() returns 0
libva info: VA-API version 1.21.0
libva info: Trying to open /usr/lib/jellyfin-ffmpeg/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_21
libva info: va_openDriver() returns 0
[AVHWDeviceContext @ 0x5b636180200] Initialize MFX session: implementation version is 2.10
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'upload/library/watn3y/2023/02/26/PXL_20230226_122726710.mp4':
Metadata:
major_brand : isom
minor_version : 131072
compatible_brands: isomiso2mp41
creation_time : 2023-02-26T12:27:31.000000Z
location : +52.3735+9.7302/
location-eng : +52.3735+9.7302/
com.android.capture.fps: 120.000000
Duration: 00:00:13.40, start: 0.000000, bitrate: 8876 kb/s
Stream #0:0[0x1](eng): Data: none (mett / 0x7474656D), 43 kb/s (default)
Metadata:
creation_time : 2023-02-26T12:27:31.000000Z
handler_name : MetaHandle
Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 191 kb/s (default)
Metadata:
creation_time : 2023-02-26T12:27:31.000000Z
handler_name : SoundHandle
vendor_id : [0][0][0][0]
Stream #0:2[0x3](eng): Video: hevc (Main), 1 reference frame (hvc1 / 0x31637668), yuvj420p(pc, bt709, left), 1920x1080 (1920x1088), 8637 kb/s, SAR 1:1 DAR 16:9, 30.08 fps, 30 tbr, 90k tbn (default)
Metadata:
creation_time : 2023-02-26T12:27:31.000000Z
handler_name : VideoHandle
vendor_id : [0][0][0][0]
Side data:
displaymatrix: rotation of -90.00 degrees
Stream mapping:
Stream #0:2 -> #0:0 (hevc (native) -> av1 (av1_qsv))
Stream #0:1 -> #0:1 (aac (native) -> opus (libopus))
Press [q] to stop, [?] for help
[graph 0 input from stream 0:2 @ 0x5b6361f69c0] w:1920 h:1080 pixfmt:yuvj420p tb:1/90000 fr:30/1 sar:1/1
[auto_scale_0 @ 0x5b6361f6d80] w:iw h:ih flags:'' interl:0
[transpose @ 0x5b6361f6a80] auto-inserting filter 'auto_scale_0' between the filter 'graph 0 input from stream 0:2' and the filter 'transpose'
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[transpose @ 0x5b6361f6a80] w:1920 h:1080 dir:1 -> w:1080 h:1920 rotation:clockwise vflip:0
[AVHWDeviceContext @ 0x5b6361869c0] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 24.1.5 (8068c2e).
[AVHWDeviceContext @ 0x5b6361869c0] Driver not found in known nonstandard list, using standard behaviour.
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[swscaler @ 0x5b63692c000] deprecated pixel format used, make sure you did set range correctly
[auto_scale_0 @ 0x5b6361f6d80] w:1920 h:1080 fmt:yuvj420p sar:1/1 -> w:1920 h:1080 fmt:nv12 sar:1/1 flags:0x00000004
[AVHWFramesContext @ 0x5b636127e00] Use Intel(R) oneVPL to create MFX session, API version is 2.10, the required implementation version is 2.10
[AVHWFramesContext @ 0x5b636127e00] Initialize MFX session: implementation version is 2.10
[av1_qsv @ 0x5b636211180] Using input frames context (format qsv) with av1_qsv encoder.
[av1_qsv @ 0x5b636211180] Encoder: input is video memory surface
[av1_qsv @ 0x5b636211180] Use Intel(R) oneVPL to create MFX session with the specified MFX loader
[av1_qsv @ 0x5b636211180] Using the intelligent constant quality (ICQ) ratecontrol method
[av1_qsv @ 0x5b636211180] profile: av1 main; level: 40
[av1_qsv @ 0x5b636211180] GopPicSize: 256; GopRefDist: 8; GopOptFlag:; IdrInterval: 0
[av1_qsv @ 0x5b636211180] TargetUsage: 3; RateControlMethod: ICQ
[av1_qsv @ 0x5b636211180] ICQQuality: 35
[av1_qsv @ 0x5b636211180] NumRefFrame: 5
[av1_qsv @ 0x5b636211180] IntRefType: 0; IntRefCycleSize: 0; IntRefQPDelta: 0; IntRefCycleDist: 0
[av1_qsv @ 0x5b636211180] MaxFrameSize: 0;
[av1_qsv @ 0x5b636211180] BitrateLimit: unknown; MBBRC: OFF; ExtBRC: unknown
[av1_qsv @ 0x5b636211180] VDENC: ON
[av1_qsv @ 0x5b636211180] BRefType: pyramid
[av1_qsv @ 0x5b636211180] PRefType: default
[av1_qsv @ 0x5b636211180] MinQPI: 1; MaxQPI: 255; MinQPP: 1; MaxQPP: 255; MinQPB: 1; MaxQPB: 255
[av1_qsv @ 0x5b636211180] FrameRateExtD: 1; FrameRateExtN: 30
[av1_qsv @ 0x5b636211180] NumTileRows: 1; NumTileColumns: 1; NumTileGroups: 1
[av1_qsv @ 0x5b636211180] WriteIVFHeaders: OFF
[av1_qsv @ 0x5b636211180] LowDelayBRC: OFF
[av1_qsv @ 0x5b636211180] MaxFrameSize: 0;
[graph_1_in_0_1 @ 0x5b6361f2580] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:stereo
[format_out_0_1 @ 0x5b6360c4e80] auto-inserting filter 'auto_aresample_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[auto_aresample_0 @ 0x5b6360c5c80] ch:2 chl:stereo fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:flt r:48000Hz
[libopus @ 0x5b636213580] No bit rate set. Defaulting to 96000 bps.
[libopus @ 0x5b636213580] Quality-based encoding not supported, please specify a bitrate and VBR setting.
[aost#0:1/libopus @ 0x5b6361df480] Error initializing output stream: Error while opening encoder for output stream #0:1 - maybe incorrect parameters such as bit_rate, rate, width or height
[AVIOContext @ 0x5b636207d40] Statistics: 0 bytes written, 0 seeks, 0 writeouts
Terminating demuxer thread 0
[AVIOContext @ 0x5b636206bc0] Statistics: 1292152 bytes read, 2 seeks
Conversion failed!
[Nest] 7 - 05/15/2024, 8:28:12 AM ERROR [ImmichMicroservices] [MediaService] Error: ffmpeg exited with code 1: Conversion failed!
The bug
When encoding video with QuickSync or VAAPI (not sure about the others) we set the CRF with the
-global_quality
ffmpeg option. Since opus does not support quality-based encoding it fails with this error message:Instead of
-global_quality
we should be using-global_quality:v
. This way, we only set the CRF for the video stream.Even though this works for aac, when using high CRF values (like the recommended 35 for AV1) we end up with quite low quality audio files (in my case the bitrate was around 60kbit/s which isn't great).
I haven't had a chance to look at the code yet but it would probably be a good idea to specify video and audio quality seperately for all encoders.
The OS that Immich Server is running on
Docker 26.1.2 on Debian Bookworm
Version of Immich Server
v1.105.1
Version of Immich Mobile App
not relevant
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Relevant log output
Additional information
No response