jellyfin / jellyfin-ffmpeg

FFmpeg for Jellyfin
https://jellyfin.org
Other
432 stars 115 forks source link

lavc/rkmppenc: add RKMPP MJPEG/JPEG encoder #373

Closed nyanmisaka closed 2 months ago

nyanmisaka commented 3 months ago

Up to 1080p@240fps or 320x180@1100fps transcoding (hit the decoder perf limit) on RK3588. Or 320x180@6000fps encoding from rawyuv nv12 testsrc2/nullsrc.

Changes

gnattu commented 3 months ago

This is interesting. Rockchip embedded JPEG encoder is significantly faster than Intel's and Apple's in nullsrc testing(3x-4x fast).

By looking at the datasheet, it has a 4x90MPixel/s JPEG encoder. At this rate, it could easily outperform powerful CPUs in terms of JPEG encoding. For example, the UHD770's MJPEG encoder can only encode at 40% of the speed of a 12900 processor, achieving 2000fps for 320x180, while the CPU itself sits at > 5000fps, but still bellows 6000.

But in reality, this speed cannot be achieved by Jellyfin, as Jellyfin is not a MJPEG camera system and does not typically use MJPEG videos as input. Additionally, the real-life performance in Jellyfin can be easily affected by the rescale filter and image sink, which I expect to be the main performance bottleneck on RK3588. However, RK3588 can be very powerful for MJPEG input to MJPEG output camera stream systems, outperforming much higher-end platforms.

nyanmisaka commented 3 months ago

image

The async depth of the mjpeg_qsv encoder seems to be hardcoded in the runtime. Raising it may increase parallelism, but it doesn't work.

gnattu commented 3 months ago

image

The async depth of the mjpeg_qsv encoder seems to be hardcoded in the runtime. Raising it may increase parallelism, but it doesn't work.

Take it easy, as we will not be getting close to the encoder limitation. The FPS filter dropping output frames effectively increases the decoder pressure by 30 times or higher, and most of our workload is going to be bottlenecked by the decoder in reality anyway. 1000fps is more than enough for this.

nyanmisaka commented 3 months ago

Take it easy, as we will not be getting close to the encoder limitation. The FPS filter dropping output frames effectively increases the decoder pressure by 30 times or higher, and most of our workload is going to be bottlenecked by the decoder in reality anyway. 1000fps is more than enough for this.

It' just a test.

image

Intel uses different copy engines in hwupload (RCS) and qsvenc internal (BCS). RCS - Rendering Command Stream BCS - Blittering Command Stream

This creates a noticeable performance difference.