ralfbiedert / openh264-rs

Idiomatic Rust wrappers around OpenH264.
69 stars 35 forks source link

Asm support #5

Closed jelmansouri-legion closed 3 years ago

jelmansouri-legion commented 3 years ago

Adding assembly support for openh264-sys2 by relying on nasm-rs on x86/x86_64. Currently if nasm is missing we do not fail the build, we issue a warning, (I had an error originally but though that might be enough if the feature is enabled by default). The feature is not enabled by default.

Here are the bench results on my machine (11th Gen Intel(R) Core(TM) i9-11950H @ 2.60GHz 2.61 GHz):

test decode_yuv_multi_512x512        ... bench:   7,725,449 ns/iter (+/- 798,862)
test decode_yuv_single_1920x1080     ... bench:   8,218,159 ns/iter (+/- 427,824)
test decode_yuv_single_512x512_cabac ... bench:   1,509,231 ns/iter (+/- 54,221)
test decode_yuv_single_512x512_cavlc ... bench:   1,708,157 ns/iter (+/- 29,290)
test whole_decoder                   ... bench:   1,878,174 ns/iter (+/- 32,181)

test encode_1920x1080_from_yuv       ... bench:  26,635,666 ns/iter (+/- 557,558)
test encode_512x512_from_yuv         ... bench:   3,486,324 ns/iter (+/- 156,321)

with the feature enabled

test decode_yuv_multi_512x512        ... bench:   3,715,871 ns/iter (+/- 121,538)
test decode_yuv_single_1920x1080     ... bench:   3,764,356 ns/iter (+/- 427,062)
test decode_yuv_single_512x512_cabac ... bench:     794,165 ns/iter (+/- 56,266)
test decode_yuv_single_512x512_cavlc ... bench:   1,267,514 ns/iter (+/- 48,114)
est whole_decoder                    ... bench:   1,490,782 ns/iter (+/- 131,085)

test encode_1920x1080_from_yuv       ... bench:  10,338,088 ns/iter (+/- 1,085,816)
test encode_512x512_from_yuv         ... bench:   1,333,494 ns/iter (+/- 29,557)

Note that I only tested the following targets: x86_64-pc-windows-msvc and x86_64-unknown-linux-gnu.

ralfbiedert commented 3 years ago

This is fantastic, thanks!

About compile behavior, I haven't thought it through but ideally I'd want the crate

That way most downstream users would transparently get the best code without any compile hiccups, while people who want to really enforce nasm get an escape hatch to fail compilation early.

ralfbiedert commented 3 years ago

Can confirm this works on Windows, Linux with similar performance gains.

Published as 0.2.5.