Feature request: Optimize vaapi performance

eikowagenknecht commented 4 years ago

It's me again with a vaapi related feature request.

Currently reencoding a file from a vaapi supported format to another vaapi supported format (e.g. h264 to h265) results in the following ffmpeg command:

/usr/local/bin/ffmpeg -hwaccel vaapi -i "inputfile.mkv" -vcodec hevc_vaapi [...] -vaapi_device /dev/dri/renderD128 -vf format=nv12,hwupload -qp 25 [ ...]

This results in downloading everything to normal memory then uploading it again to the video card.

What should be used instead is:

/usr/local/bin/ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i "inputfile.mkv" -vcodec hevc_vaapi [...] -vaapi_device /dev/dri/renderD128 -qp 25 [ ...]

Performance gain is for me (same file): Current method: 0,7x encoding speed from h264 to h265 Optimized method: 0.9x encoding speed from h264 to h265

So +20% speed and probably lower memory usage 👍

Decoding alone is 25x with new parameters and only 5x without.

See also https://trac.ffmpeg.org/wiki/Hardware/VAAPI for reference.

I tried using preopts -hwaccel_output_format, vaapi as a workaround but since this does not remove -vf format=nv12,hwupload it didn't work (ffmpeg error).

This only works when the input is hardware decodable, otherwise the current command has to be used.

So maybe it's better to use the more general but complicated version as mentioned in https://trac.ffmpeg.org/wiki/Hardware/VAAPI, search for "when the input may or may not be hardware decodable" on the page.

mdhiggins commented 4 years ago

https://github.com/mdhiggins/sickbeard_mp4_automator/tree/vaapi-qp

Made some changes to this branch, see if that does what you're after Wasn't 100% sure where in the filters scaling would need to be placed though so that might be an issue

eikowagenknecht commented 4 years ago

As always: Thanks for your amizingly fast response! Testing this now.

ffmpeg before:

/usr/local/bin/ffmpeg -hwaccel vaapi -i /input.mkv -vcodec hevc_vaapi -map 0:16 -pix_fmt yuv420p -metadata:s:v title=FHD -level 0.0 -tag:v hvc1 -vaapi_device /dev/dri/renderD128 -vf format=nv12,hwupload -qp 25 -maxrate:v 5m -bufsize 15m -c:a:0 aac -map 0:0 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:0 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y /output.mp4

ffmpeg after:

/usr/local/bin/ffmpeg -hwaccel vaapi -i input.mkv -vcodec hevc_vaapi -map 0:16 -pix_fmt yuv420p -metadata:s:v title=FHD -tag:v hvc1 -vaapi_device /dev/dri/renderD128 -vf format=nv12,hwupload -qp 25 -maxrate:v 5m -bufsize 15m -c:a:0 aac -map 0:0 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:0 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y output.mp4

So only the "-level 0.0" seems to have disappeared. There is nor difference in performance (0.705x).

Do I need to set any special config options for the new feature to be used?

mdhiggins commented 4 years ago

Weird not sure why level disappeared though that probably shouldn't have been there anyway since it was 0. It does appear the changes I made aren't making it to the command. You're using the different branch of the script right? The changes aren't in master. I'm at work at the moment so can't do any testing and would need to build a vaapi container on docker.

eikowagenknecht commented 4 years ago

Yes I'm using the current vaapi-qp branch.

root@sonarr:/usr/local/sma# cat converter/avcodecs.py | grep hwupload
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:%s' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:trunc(ow/a/2)*2' % (self.scale_filter, safe['vaapi_wscale'])])
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=trunc((oh*a)/2)*2:%s' % (self.scale_filter, safe['vaapi_hscale'])])
                optlist.extend(['-vf', "format=nv12|vaapi,hwupload"])
                optlist.extend(['-vf', 'hwupload,%s=%s:%s:format=nv12' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
                optlist.extend(['-vf', 'hwupload,%s=%s:trunc(ow/a/2)*2:format=nv12' % (self.scale_filter, safe['vaapi_wscale'])])
                optlist.extend(['-vf', 'hwupload,%s=trunc((oh*a)/2)*2:%s:format=nv12' % (self.scale_filter, safe['vaapi_hscale'])])
                optlist.extend(['-vf', "format=nv12,hwupload"])
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:%s' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:trunc(ow/a/2)*2' % (self.scale_filter, safe['vaapi_wscale'])])
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=trunc((oh*a)/2)*2:%s' % (self.scale_filter, safe['vaapi_hscale'])])
                optlist.extend(['-vf', "format=nv12|vaapi,hwupload"])
                optlist.extend(['-vf', 'hwupload,%s=%s:%s:format=nv12' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
                optlist.extend(['-vf', 'hwupload,%s=%s:trunc(ow/a/2)*2:format=nv12' % (self.scale_filter, safe['vaapi_wscale'])])
                optlist.extend(['-vf', 'hwupload,%s=trunc((oh*a)/2)*2:%s:format=nv12' % (self.scale_filter, safe['vaapi_hscale'])])
                optlist.extend(['-vf', "format=nv12,hwupload"])

From the result it seems like it's not finding "device" in safe but I don't know the code base enough to judge what that means:

    def _codec_specific_produce_ffmpeg_list(self, safe, stream=0):
        optlist = super(H264VAAPICodec, self)._codec_specific_produce_ffmpeg_list(safe, stream)
        if 'device' in safe:
            optlist.extend(['-filter_hw_device', safe['device']])
            if 'vaapi_wscale' in safe and 'vaapi_hscale' in safe:
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:%s' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
            elif 'vaapi_wscale' in safe:
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=%s:trunc(ow/a/2)*2' % (self.scale_filter, safe['vaapi_wscale'])])
            elif 'vaapi_hscale' in safe:
                optlist.extend(['-vf', 'format=nv12|vaapi,hwupload,%s=trunc((oh*a)/2)*2:%s' % (self.scale_filter, safe['vaapi_hscale'])])
            else:
                optlist.extend(['-vf', "format=nv12|vaapi,hwupload"])
        else:
            optlist.extend(['-vaapi_device', '/dev/dri/renderD128'])
            if 'vaapi_wscale' in safe and 'vaapi_hscale' in safe:
                optlist.extend(['-vf', 'hwupload,%s=%s:%s:format=nv12' % (self.scale_filter, safe['vaapi_wscale'], safe['vaapi_hscale'])])
            elif 'vaapi_wscale' in safe:
                optlist.extend(['-vf', 'hwupload,%s=%s:trunc(ow/a/2)*2:format=nv12' % (self.scale_filter, safe['vaapi_wscale'])])
            elif 'vaapi_hscale' in safe:
                optlist.extend(['-vf', 'hwupload,%s=trunc((oh*a)/2)*2:%s:format=nv12' % (self.scale_filter, safe['vaapi_hscale'])])
            else:
                optlist.extend(['-vf', "format=nv12,hwupload"])
        if 'qp' in safe:
            optlist.extend(['-qp', str(safe['qp'])])
            if 'maxrate' in safe:
                optlist.extend(['-maxrate:v', str(safe['maxrate'])])
            if 'bufsize' in safe:
                optlist.extend(['-bufsize', str(safe['bufsize'])])

        return optlist

mdhiggins commented 4 years ago

Found the problem, fixed with latest commit, was pulling the encoder_options from the wrong super class

eikowagenknecht commented 4 years ago

Just pulled the new commit, but still the same ffmpeg result:

/usr/local/bin/ffmpeg -hwaccel vaapi -i input.mkv -vcodec hevc_vaapi -map 0:16 -pix_fmt yuv420p -metadata:s:v title=FHD -tag:v hvc1 -vaapi_device /dev/dri/renderD128 -vf format=nv12,hwupload -qp 22 -maxrate:v 5m -bufsize 15m -c:a:0 aac -map 0:0 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:0 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y output.mp4

Can I do any further testing to help?

mdhiggins commented 4 years ago

Could you post the full conversion log? I can check and see why device isn't getting injected from that maybe

eikowagenknecht commented 4 years ago

Sure, here's the snippet from sma.log from my last conversion (episode replaced with asterisks):

2020-08-13 16:34:54 - MANUAL - INFO - Manual processor started.
2020-08-13 16:34:54 - MANUAL - INFO - /usr/local/sma/venv/bin/python3
2020-08-13 16:34:54 - MANUAL - INFO - Loading config file ./autoProcessShows.ini.
2020-08-13 16:34:56 - MANUAL - INFO - Matched TV episode as ************ S05E07
2020-08-13 16:34:56 - MANUAL - INFO - Processing ************
2020-08-13 16:34:56 - MANUAL - INFO - Input Data
2020-08-13 16:34:56 - MANUAL - INFO - {
    "format": "matroska,webm",
    "format-fullname": "unknown",
    "video": {
        "index": 0,
        "codec": "hevc",
        "pix_fmt": "yuv420p10le",
        "profile": "2",
        "fps": 23.976023976023978,
        "level": 12.0,
        "field_order": "unknown"
    },
    "audio": [
        {
            "index": 1,
            "codec": "ac3",
            "bitrate": 640000,
            "channels": 6,
            "samplerate": 48000,
            "language": "eng",
            "disposition": "-default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired"
        }
    ],
    "subtitle": [
        {
            "index": 2,
            "codec": "hdmv_pgs_subtitle",
            "disposition": "-default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired",
            "language": "eng"
        }
    ],
    "attachment": []
}
2020-08-13 16:34:56 - MANUAL - INFO - Reading video stream.
2020-08-13 16:34:56 - MANUAL - INFO - Video codec detected: hevc.
2020-08-13 16:34:56 - MANUAL - INFO - Pix Fmt: yuv420p10le.
2020-08-13 16:34:56 - MANUAL - INFO - Profile: 2.
2020-08-13 16:34:56 - MANUAL - INFO - Acceptable profile match found for VBR 1959.34745 using CRF 22, maxrate 5m, bufsize 15m.
2020-08-13 16:34:56 - MANUAL - INFO - Video codec parameters None.
2020-08-13 16:34:56 - MANUAL - INFO - Creating h265vaapi video stream from source stream 0.
2020-08-13 16:34:56 - MANUAL - INFO - Reading audio streams.
2020-08-13 16:34:56 - MANUAL - INFO - Audio detected for stream 1 - ac3 eng 6 channel.
2020-08-13 16:34:56 - MANUAL - INFO - Creating aac audio stream source audio stream 1 [universal-audio].
2020-08-13 16:34:56 - MANUAL - INFO - Creating copy audio stream from source stream 1.
2020-08-13 16:34:56 - MANUAL - INFO - Default audio stream set to eng copy 6 channel stream [default-more-channels: True].
2020-08-13 16:34:56 - MANUAL - INFO - Reading subtitle streams.
2020-08-13 16:34:56 - MANUAL - INFO - Image-based subtitle detected for stream 2 - hdmv_pgs_subtitle eng.
2020-08-13 16:34:56 - MANUAL - INFO - Attempting to download subtitles.
2020-08-13 16:34:56 - MANUAL - INFO - Scanned for external subtitles and found 0 results in your approved languages.
2020-08-13 16:34:56 - MANUAL - INFO - vaapi hwaccel is supported by this ffmpeg build and will be used [hwaccels].
2020-08-13 16:34:56 - MANUAL - INFO - Output Data
2020-08-13 16:34:56 - MANUAL - INFO - {
    "source": [
        "/tv/************.mkv"
    ],
    "format": "mp4",
    "video": {
        "codec": "h265vaapi",
        "map": 0,
        "bitrate": 1959.34745,
        "crf": 22,
        "maxrate": "5m",
        "bufsize": "15m",
        "level": 0.0,
        "profile": null,
        "pix_fmt": "yuv420p",
        "field_order": "unknown",
        "width": null,
        "filter": null,
        "params": null,
        "title": "FHD",
        "debug": "video.pix_fmt"
    },
    "audio": [
        {
            "map": 1,
            "codec": "aac",
            "channels": 2,
            "bitrate": 256,
            "samplerate": null,
            "filter": null,
            "language": "eng",
            "disposition": "-default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired",
            "title": "Stereo",
            "debug": "universal-audio"
        },
        {
            "map": 1,
            "codec": "copy",
            "channels": 6,
            "bitrate": 768,
            "filter": null,
            "samplerate": null,
            "language": "eng",
            "disposition": "+default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired",
            "bsf": null,
            "title": "5.1 Channel",
            "debug": "audio"
        }
    ],
    "subtitle": [],
    "attachment": []
}
2020-08-13 16:34:56 - MANUAL - INFO - Preopts
2020-08-13 16:34:56 - MANUAL - INFO - [
    "-hwaccel",
    "vaapi"
]
2020-08-13 16:34:56 - MANUAL - INFO - Postopts
2020-08-13 16:34:56 - MANUAL - INFO - [
    "-threads",
    "0",
    "-metadata:g",
    "encoding_tool=SMA"
]
2020-08-13 16:34:56 - MANUAL - INFO - Downloaded Subtitles
2020-08-13 16:34:56 - MANUAL - INFO - []
2020-08-13 16:34:56 - MANUAL - INFO - Starting conversion.
2020-08-13 16:34:57 - MANUAL - INFO - FFmpeg command:
2020-08-13 16:34:57 - MANUAL - INFO - ======================
2020-08-13 16:34:57 - MANUAL - INFO - /usr/local/bin/ffmpeg -hwaccel vaapi -i /tv/************.mkv -vcodec hevc_vaapi -map 0:0 -pix_fmt yuv420p -metadata:s:v title=FHD -tag:v hvc1 -vaapi_device /dev/dri/renderD128 -vf format=nv12,hwupload -qp 22 -maxrate:v 5m -bufsize 15m -c:a:0 aac -map 0:1 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:1 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y /tv/************.mp4
2020-08-13 16:34:57 - MANUAL - INFO - ======================

mdhiggins commented 4 years ago

That was helpful New commit which should hopefully fix the device not being added to the optlist

eikowagenknecht commented 4 years ago

This has done the trick! Speedup from 0.86x to 1.31x on my Synology NAS for a typical 1080p Episode h26510bit -> h2658bit conversion. That saves a lot of time for me!

PS: If you accept donations I'd like to give something for your great support :-)

mdhiggins commented 4 years ago

Donations aren't necessary Any chance you could test to see if scaling/resizing works with VAAPI?

Biggest challenges with this solution are to make sure it works with other hwaccel options and to make sure scaling still works, and I guess that it falls back when not hardware decodeable like its supposed to

eikowagenknecht commented 4 years ago

Sure, I can run all the test conversions you want ;-)

But I've never used scaling / resizing before. What's the best way to test if it works?

mdhiggins commented 4 years ago

Just set your max-width to something smaller than your source material so it has to resize and see if the resulting output file was accurately resized to what you'd expect

eikowagenknecht commented 4 years ago

Tested on an 1080p file, resizing to 100px. Worked just fine at 8.5x speed.

mdhiggins commented 4 years ago

If you're in the mood to do some more testing, I overhauled some things to try and make this cleaner and more broadly compatible

0269583ac093808459abe52e7c37ff4de4099ae8

Added a new hwdevices option which gives users the ability to set specific devices for encoders and decoders. Included the default vaapi device at baseline.

This lets the script know what devices the user wants, but also allows it to check if the decoder and encoder are using the same device and define that more precisely. If for some reason two devices were to be used now its smart enough to detect that, map accordingly, and add hwdownload/hwupload to the filter only when appropriate to do so

Given that the new filter formatting should be universal I did remove some of the old code since it was relatively redundant

Let me know if I broke anything or how its working for you. Hopefully this will be expandable to other hardware accelerated codecs going forward too

eikowagenknecht commented 4 years ago

Phew, I found one thing that is maybe feature, maybe bug.

Before this change, everything was converted to 8bit (yuv420p), now nothing is converted.

Since I'm using SMA mainly to batch convert 10bit files to 8bit, this is rather unfortunate for me.

Example ffmpeg command before (h265 10bit -> h265 8bit):

/usr/local/bin/ffmpeg -ss 00:00:00 -to 00:00:05 -hwaccel vaapi -i "/tv/in.mp4" -vcodec hevc_vaapi -map 0:0 -field_order progressive -filter_hw_device sma -vf format="nv12,hwupload" -f mp4 -threads 0 -y "/tv/out.mp4"

Resulting video: HVC1 1920x1080 23.976fps 24266kbps [V: hevc main L4.0, yuv420p, 1920x1080, 24266 kb/s]

Example ffmpeg command after (h265 10bit -> h265 10bit):

/usr/local/bin/ffmpeg -ss 00:00:00 -to 00:00:05 -init_hw_device vaapi="sma:/dev/dri/renderD128" -hwaccel vaapi -hwaccel_output_format vaapi -hwaccel_device sma -i "/tv/in.mp4" -vcodec hevc_vaapi -map 0:0 -field_order progressive -filter_hw_device sma -vf format="nv12|vaapi,hwupload" -f mp4 -threads 0 -y "/tv/out.mp4"

Resulting video: HVC1 1920x1080 23.976fps 23819kbps [V: hevc main 10 L4.0, yuv420p10le, 1920x1080, 23819 kb/s]

Adding a -pix_fmt yuv420p option is ignored in both cases it seems:

Incompatible pixel format 'yuv420p' for codec 'hevc_vaapi', auto-selecting format 'vaapi_vld'

Adding -profile:v main does not work:

[hevc_vaapi @ 0x55d465f419c0] No usable encoding profile found.
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

I don't have a solution to this yet. I spent quite some time looking for a parameter, filter, ... that does the trick, but could not find any. For now, it seems to do 10bit -> 8bit conversion the old (slower) approach must be used.

Please don't merge current state into master yet, because for me it would make SMA unusable unfortunately :-/

Any help is appreciated.

eikowagenknecht commented 4 years ago

This page https://ffmpeg.org/ffmpeg-filters.html#Filtergraph-description mentions a filter "tonemap_vaapi" that could maybe do the trick, but on my latest snapshot build, trying to use it gets me:

[AVFilterGraph @ 0x555ab9a3b680] No such filter: 'tonemap_vaapi'

Instead the following seem to be available, but are not documented on that site:

 ... deinterlace_vaapi V->V       (null)
 ... denoise_vaapi     V->V       (null)
 ... procamp_vaapi     V->V       (null)
 ... scale_vaapi       V->V       (null)
 ... sharpness_vaapi   V->V       (null)
 ... transpose_vaapi   V->V       (null)

Also, the following filter statement "fixes" it, but of course also removes the performance improvement:

-vf "hwdownload,format=nv12|vaapi,hwupload"

This could at least be a (not too great) workaround to only pass this filter when a conversion 10bit > 8bit is wanted and otherwise the better performance can be used.

mdhiggins commented 4 years ago

This may actually be a build issue with your ffmpeg binary from what I'm reading and that it may not be compiled with vaapi 10 bit support for the surface encoder. You skirt this issue by copying the file to memory then reuploading it but like you said you're essentially returning back to how things were before. See if there's anything you can adjust for your ffmpeg build

Another thing to try would be setting your profile to main10 and see if that helps

Another thing to try would be setting your pix_fmt via the filter? format=pix_fmts=yuv420p format=p010 might even work too

Try this stuff and let me know

And just to clarify, the 10->8 bit issue has been for all builds of the fork correct?

mdhiggins commented 4 years ago

http://forum.notebookreview.com/threads/hdr10-10-bit-hevc-encoding-with-ffmpegs-vaapi-encoders.828289/

Potentially some sample commands to take a look at

mdhiggins commented 4 years ago

And just to clarify the -vf "hwdownload,format=nv12|vaapi,hwupload" command is just going back to the old behavior in the master branch, the hwdownload wasn't explicitly specified but by setting the format to nv12 it triggers the download then has to reupload

eikowagenknecht commented 4 years ago

To be clear, my goal is to convert an input x265 10bit to an output x265 8bit, using vaapi and the most hardware acceleration possible.

The behaviour is consistent in all builds of the fork since the generated ffmpeg command is pretty much the same and it seems like this is more of a problem with ffmpeg than SMA.

I'm using the docker ffmpeg build from https://hub.docker.com/r/jrottenberg/ffmpeg/ (snapshot-vaapi). Not sure how to adjust this.

format=pix_fmts=yuv420p and format=p010 both lead to the error:

Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0'

profile:V main10 results in the output still being 10bit. With -bits_per_raw_sample 10 or -bits_per_raw_sample 8 (from your link) also the output is still 10bit.

eikowagenknecht commented 4 years ago

One completely unrelated small problem in the newest build of the fork is a typo: You have "sma:/dev/dri/renderd128" instead of "sma:/dev/dri/renderD128", so the device can not be found. Otherwise, the resulting ffmpeg commands are the same.

mdhiggins commented 4 years ago

Ah I misunderstood, I thought it was forcing everything to 8 bit and you wanted 10 bit, let me redirect my efforts a little, but I will fix the typo

mdhiggins commented 4 years ago

This may actually be a problem with the intel media driver for ubuntu depending on what version you're looking at

https://trac.ffmpeg.org/ticket/7764 seems to discuss this

which lead me to this github issue

https://github.com/intel/media-driver/issues/760

which was fixed back on April 1st

If your version of ubuntu includes an older version of that driver it may not have this fix

mdhiggins commented 4 years ago

88d1e4cf10ef72e8dd999c346c7ae9e54f9ba3e7

That fixes the case issue

Also found this issue which seems to be made by the same person further indicating it may be an older driver issue https://gitlab.freedesktop.org/mesa/mesa/-/issues/1418

eikowagenknecht commented 4 years ago

Yeah sorry, I probably wasn't clear enough before about trying to convert 10bit to 8bit. I need to do this because my Plex app on the TV only supports 8bit unfortunately...

This link is an amazing find, it seems like scale_vaapi=format=nv12 instead of format=nv12|vaapi,hwupload does the trick for me. Performance is great (0.95x) where hwdownload,format=nv12,hwupload only gets to 0.83x and the resulting file is finally yuv420p :-) Wouldn't have thought that a "scale" named filter is the one you use for other format conversions as well..

Remaining question is how this behaves when the input format is not vaapi decodable. This is not relevant to me, but maybe to other users.

mdhiggins commented 4 years ago

Problem is you're still forcing the nv12 format and losing performance again

I think the pure hardware approach should work if the media drivers are updated

Also curious if format=nv12|vaapi,hwupload,scale_vaapi would work if you want to test that

It appears you don't even need to build from source and can just update through ubuntu package manager https://launchpad.net/ubuntu/+source/intel-media-driver

eikowagenknecht commented 4 years ago

I think I'm not losing performance. My benchmark encodings show that scale_vaapi=format=nv12 works at the same speed as format=nv12|vaapi,hwupload, but results in 8bit instead of 10bit. So it does exactly what I want.

The approach format=nv12|vaapi,hwupload,scale_vaapi=format=nv12 could be a good generic approach. This usage is actually documented on https://wiki.libav.org/Hardware/vaapi but I hadn't seen it before.

format=nv12|vaapi,hwupload,scale_vaapi (without the format option, like you suggested) instead results in 10bit.

So it seems that format=nv12|vaapi,hwupload needs to be used when there should be no conversion 10bit <-> 8bit and format=nv12|vaapi,hwupload,scale_vaapi=format=nv12 can be used when there should be a conversion to 8bit.

mdhiggins commented 4 years ago

I messed up the first filter line and didn't made my edit in time

Does format=nv12|vaapi,hwupload,scale_vaapi with the format option make any difference?

eikowagenknecht commented 4 years ago

I think it's all in my previous comment. format=nv12|vaapi,hwupload,scale_vaapi=format=nv12 works the same as scale_vaapi=format=nv12, result and performance wise. format=nv12|vaapi,hwupload,scale_vaapi instead does not convert 10bit to 8bit.

mdhiggins commented 4 years ago

so I'm thinking if I forward the 'pix_fmt' option into that filter I can get a dynamic and universal solution that should allow people to set what they want, I'll get back to you

eikowagenknecht commented 4 years ago

I agree, pix_fmt=yuv420p could instead be translated to filter scale_vaapi=format=nv12 for vaapi.

I tried using it with the following config:

[Video]
profile = 
crf-profiles = 0:23:5M:15M,4000:22:10M:24M,8000:22:20M:60M
codec = h265vaapi, hevc, h265, x265, x264, h264
crf = -1
max-width = 0
max-bitrate = 0
pix-fmt = yuv420p
max-level = 0.0
filter = scale_vaapi=format=nv12
force-filter = False
codec-parameters =

But that puts the filter before the others, resulting in -vf scale_vaapi=format=nv12,format=nv12|vaapi,hwupload which obviously doesn't work. So some coding changes are needed indeed :-)

mdhiggins commented 4 years ago

108add49e9a693281e3e7dfc6499efed556af610 40e1c3fdaf1ac268eac57d2ce2196801e7aec07d

Alright so that intercepts the pix-fmt setting and applies it to the vaapi filter, so if you do your settings like this it should work

[Video]
profile = 
crf-profiles = 0:23:5M:15M,4000:22:10M:24M,8000:22:20M:60M
codec = h265vaapi, hevc, h265, x265, x264, h264
crf = -1
max-width = 0
max-bitrate = 0
pix-fmt = nv12, yuv420p
max-level = 0.0
filter =
force-filter = False
codec-parameters =

And if you changed your pix-fmt to p010 instead of nv12 that should let you choose to keep 10bit if you wanted

I did remove your filter setting you tried

Finally, one more thing to test is to see if this breaks scaling since its adding additional options to the scale_vaapi filter

Basically we need it to pass 4 scenarios

10bit to 8bit using nv12 pix-fmt
10bit to 10bit using p010 pix-fmt
10bit to 8bit using nv12 pix-fmt w/ resizing
10bit to 10bit using p010 pix-fmt w/ resizing

Let me know

For me I'm getting a generated filter string of format=p010|vaapi,hwupload,scale_vaapi=640:trunc(ow/a/2)*2=format=p010 using pix-fmt = p010 and max-width = 640 so it looks like the command generation is working

Only question is if the order there for the scale_vaapi parameters is correct if or if the format stuff needs to come before the resizing

eikowagenknecht commented 4 years ago

Great, will test this thorougly soon :-)

I'll add the cases

8bit to 8bit using nv12 pix-fmt
8bit to 10bit using p010 pix-fmt
8bit to 8bit using nv12 pix-fmt w/ resizing
8bit to 10bit using p010 pix-fmt w/ resizing

to be sure we don't run into any problems with 8bit input either.

mdhiggins commented 4 years ago

Sounds good, and just a quick note I haven't updated the code for h264vaapi to reflect these changes so just test with h265vaapi which I think you've been doing already anyway

eikowagenknecht commented 4 years ago

Yes, I'm only using h265 vaapi for now.

First problem:

[Video]
profile = 
crf-profiles = 0:23:5M:15M,4000:22:10M:24M,8000:22:20M:60M
codec = h265vaapi, hevc, h265, x265, x264, h264
crf = -1
max-width = 0
max-bitrate = 0
pix-fmt = nv12, yuv420p
max-level = 0.0
filter = 
force-filter = False
codec-parameters =

results in:

/usr/local/bin/ffmpeg -fix_sub_duration -init_hw_device vaapi=sma:/dev/dri/renderD128 -hwaccel_device sma -hwaccel vaapi -hwaccel_output_format vaapi -i /tv/in.mp4.original -i /tv/in.srt -vcodec hevc_vaapi -map 0:0 -field_order progressive -metadata:s:v title=FHD -tag:v hvc1 -filter_hw_device sma -vf format=,scale_vaapinv12|vaapi,hwupload=format=nv12 -c:a:0 aac -map 0:2 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:2 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -c:s:0 mov_text -map 1:0 -metadata:s:s:0 title= -metadata:s:s:0 language=eng -disposition:s:0 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y /tv/out.mp4

[AVFilterGraph @ 0x5651ba7e93c0] Error initializing filter 'format' with args ''
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0

-vf format=,scale_vaapinv12|vaapi,hwupload=format=nv12 seems to be the problem here.

mdhiggins commented 4 years ago

Whoops had variable names switched in that one condition 010ec861db233ab281e7a74881e5e71ca1e1c870 That fixes it

-vf format=p010|vaapi,hwupload,scale_vaapi=format=p010 for me now using p010

eikowagenknecht commented 4 years ago

Trying again, it is converting now and filter looks good to me. But can it be that now qp, maxrate etc. parameters are missing? My video section is unchanged.

/usr/local/bin/ffmpeg -fix_sub_duration -init_hw_device vaapi=sma:/dev/dri/renderD128 -hwaccel_device sma -hwaccel vaapi -hwaccel_output_format vaapi -i /in.mp4.original -i /tv/in.en.srt -vcodec hevc_vaapi -map 0:0 -field_order progressive -metadata:s:v title=FHD -tag:v hvc1 -filter_hw_device sma -vf format=nv12|vaapi,hwupload,scale_vaapi=format=nv12 -c:a:0 aac -map 0:2 -ac:a:0 2 -b:a:0 256k -metadata:s:a:0 BPS=256000 -metadata:s:a:0 BPS-eng=256000 -metadata:s:a:0 title=Stereo -metadata:s:a:0 language=eng -disposition:a:0 -default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -strict experimental -c:a:1 copy -map 0:2 -metadata:s:a:1 title=5.1 Channel -metadata:s:a:1 language=eng -disposition:a:1 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -c:s:0 mov_text -map 1:0 -metadata:s:s:0 title= -metadata:s:s:0 language=eng -disposition:s:0 +default-dub-original-comment-lyrics-karaoke-forced-hearing_impaired-visual_impaired-captions -f mp4 -threads 0 -metadata:g encoding_tool=SMA -y /tv/out.mp4

eikowagenknecht commented 4 years ago

Also I noticed that pix-fmt = nv12 seems to be enough and results in the same command as pix-fmt = nv12, yuv420p Is yuv420p used for anything there?

mdhiggins commented 4 years ago

b7eace949ba37a3e295e38b713814cc4bea6e19a Looks like at some point yesterday part of the quality profile code was accidentally deleted, I've restored it for h265vaapi here

The way the pix-fmt setting works is similar to codecs. Anything on the list is approved and won't force a source video to encode if it matches, but the first item is what's used for encoding if encoding needs to take place

EDIT: the reason behind this is that you could have acceptable pix-fmt options that if all your other settings were ok (width, codec, bitrate, etc) you could potentially just copy from the source with no encoding, like for example if something was already 8bit. This preserves maximum quality and has the least performance impact

eikowagenknecht commented 4 years ago

Allright, it's converting now with qp and maxrate set again. I'll let it do it's thing for some episodes and then continue with the other tests.

Thanks for the explanation regarding the pix-fmt option, good idea to have it that way!

I'll comment again when the tests are done or I find another problem.

mdhiggins commented 4 years ago

Sounds good, keep me posted

eikowagenknecht commented 4 years ago

Ok, first results incoming:

10bit to 8bit using nv12 pix-fmt <<< fine, 1.52x speed
10bit to 10bit using p010 pix-fmt <<< fine, 1.52x speed
10bit to 10bit using no pix-fmt options <<< fine, no speed given, finished instantly (copy)

10bit to 8bit using nv12 pix-fmt w/ resizing <<< Error:


-vf format=p010|vaapi,hwupload,scale_vaapi=100:56=format=p010

[Parsed_scale_vaapi_2 @ 0x5630ffcb4500] Option '56' not found [AVFilterGraph @ 0x5630ffcb2580] Error initializing filter 'scale_vaapi' with args '100:56=format=nv12' Error reinitializing filters! Failed to inject frame into filter network: Option not found


5. 10bit to 10bit using p010 pix-fmt w/ resizing <<< Error, same as before

eikowagenknecht commented 4 years ago

8bit to 8bit using nv12 pix-fmt <<< fine, 1.37x speed
8bit to 10bit using p010 pix-fmt <<< fine, 1.38x speed
8bit to 8bit using no pix-fmt options <<< fine, no speed given, finished instantly (copy)

8bit to 8bit using nv12 pix-fmt w/ resizing <<< Error:

[Parsed_scale_vaapi_2 @ 0x5605b9bb9200] Option '56' not found
[AVFilterGraph @ 0x5605b9bb7280] Error initializing filter 'scale_vaapi' with args '100:56=format=nv12'
Error reinitializing filters!
Failed to inject frame into filter network: Option not found
Error while processing the decoded data for stream #0:0
[aac @ 0x5605b8fbab00] Qavg: 58773.465
[aac @ 0x5605b8fbab00] 2 frames left in the queue on closing
Conversion failed!

8bit to 10bit using p010 pix-fmt w/ resizing <<< Error, same as before

mdhiggins commented 4 years ago

b4da17402675b55d243b3f6908af0f7d9c1d872e

Reformatted the width and height parameters, see if that fixes the resize scenarios that are failing

eikowagenknecht commented 4 years ago

Nope,

-vf format=p010|vaapi,hwupload,scale_vaapi=w=100:h=56=format=p010

[scale_vaapi @ 0x55b3a94e0300] [Eval @ 0x7ffdbb1b01a0] Invalid chars '=format=p010' at the end of expression '56=format=p010'
[scale_vaapi @ 0x55b3a94e0300] Error when evaluating the expression '56=format=p010'.
Maybe the expression for out_w:'100' or for out_h:'56=format=p010' is self-referencing.
[Parsed_scale_vaapi_2 @ 0x55b3a94e0200] Failed to configure output pad on Parsed_scale_vaapi_2
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
[aac @ 0x55b3a88e1b00] Qavg: 58773.465
[aac @ 0x55b3a88e1b00] 2 frames left in the queue on closing
Conversion failed!

eikowagenknecht commented 4 years ago

According to https://trac.ffmpeg.org/wiki/Hardware/VAAPI I think it needs to be format=p010|vaapi,hwupload,scale_vaapi=w=100:h=56:format=p010

When I manually edit the ffmpeg command to that it works fine.

mdhiggins commented 4 years ago

Ah nice that's helpful ff8307140a7bcb1be2300e0b8788aa02513e9cf8

See if that gets the correct formatting

eikowagenknecht commented 4 years ago

Great, now all remaining tests also passed 👍 Using this as my main build now!

mdhiggins commented 4 years ago

Excellent Thanks for running through all those tests I'll update h264vaapi with the same methods and should be ready to merge back to master

Just as a tip, for the pix-fmt option FFprobe will never report nv12 or p010 so make sure you include other formats to ensure it doesn't force all of these to transcode if they're already in the desired format (for example 10bit gets reported as yuv420p10le, not 100% sure what 8-bit gets reported as but if you probe your converted files you can tell)

But if 10 bit was what you desired then pix-fmt = p010, yuv420p10le would be an example of how that should look

mdhiggins / sickbeard_mp4_automator

Feature request: Optimize vaapi performance #1300