Closed aeikum closed 2 years ago
I'd rather expose an opaque struct Settings
with Default implemented and frame/tile_threads exposed as setter/getter.
From the GStreamer plugin point of view I'd also prefer this exposed as proper settings. The plugin would by default use multi-threaded decoding but it should be possible to override that, and potentially other settings. Using all available cores is not always the best strategy.
Well, this is annoying. dav1d changed their public API after 0.9.2 (not yet released). So I guess we need to add some smarts to handle either Dav1dSettings struct layout. Sigh.
That's a different soname and a different version number too, so that can be detected at build time. Not really a problem, just an inconvenience :)
Yes. I'm just getting really tired of the complete lack of API stability in the open source world, but that's a rant for another day...
semver exists, I expect that to hit dav1d 0.10.
Running more threads doesn't utilize the hardware more efficiently. As long as it's not considerably less efficient, however, it does improve the key performance time. Amhdal's law also teaches us to expect an upper limit of such scalability. Thus, having a default thread usage bounded by a constant doesn't sound too bad to me. Conversely, runtime detection or unbounded thread usage–even unbounded as in number-of-cores–would sound inconvenient to use in the context of a larger program. If this may turn out to improve performance a configuration is helpful. (Also, I would much rather see a separate benchmark feature, explicitly invoked, to detect such hardware optimal numbers than guesstimation based on core count alone. That's somewhat hard to do properly.)
There are some platforms that do not allow the creation of threads through std
, prominently the wasm
targets that do not have multithreading (yet). In jpeg-decoder
we #[cfg()]
those to disable such features, but since targetting a native library here it might not be the biggest concern?
dav1d is routinely tested on many cores/threads systems upstream, so that is not a big concern. I would welcome a feature to automated the testing in the crate though.
Thanks for the feedback, all.
After reviewing upstream changes, I've changed the strategy here a bit.
One last nit: I'm a little uncomfortable leaving the functions undocumented. However, dav1d-rs doesn't contain any other meaningful documentation, so I decided to leave them undocumented for consistency. Such a simple library can be studied by the source, of course.
Here is an additional diff to dav1d-rs to support the current upstream Dav1dSettings API. I didn't include this in the PR, since there is no tagged dav1d release which contains this change.
diff --git a/src/lib.rs b/src/lib.rs
index 331905a..b351695 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -53,11 +53,11 @@ impl Settings {
}
pub fn set_max_threads(self: &mut Self, n_threads: u32) {
- self.dav1d_settings.n_tile_threads = n_threads as i32;
+ self.dav1d_settings.n_threads = n_threads as i32;
}
pub fn set_max_frame_delay(self: &mut Self, max_frame_delay: u32) {
- self.dav1d_settings.n_frame_threads = max_frame_delay as i32;
+ self.dav1d_settings.max_frame_delay = max_frame_delay as i32;
}
}
Here is a diff to gst-plugins-rs to show usage of the new Settings API. This diff is necessary because gst-plugins-rs's dav1ddec plugin crashes with parallel frame decoding, so we need to disable it.
diff --git a/video/dav1d/src/dav1ddec/imp.rs b/video/dav1d/src/dav1ddec/imp.rs
index eb15f424..eeef570c 100644
--- a/video/dav1d/src/dav1ddec/imp.rs
+++ b/video/dav1d/src/dav1ddec/imp.rs
@@ -345,6 +345,17 @@ impl ObjectSubclass for Dav1dDec {
const NAME: &'static str = "RsDav1dDec";
type Type = super::Dav1dDec;
type ParentType = gst_video::VideoDecoder;
+
+ fn with_class(_klass: &Self::Class) -> Self {
+ let mut settings = dav1d::Settings::new();
+
+ settings.set_max_frame_delay(1);
+
+ Self {
+ decoder: Mutex::new(dav1d::Decoder::new_with_settings(&settings)),
+ ..Default::default(),
+ }
+ }
}
impl ObjectImpl for Dav1dDec {}
This diff is necessary because gst-plugins-rs's dav1ddec plugin crashes with parallel frame decoding, so we need to disable it.
Why does it crash?
This diff is necessary because gst-plugins-rs's dav1ddec plugin crashes with parallel frame decoding, so we need to disable it.
Why does it crash?
I don't know. It's not clear to me from dav1d docs what the parallel frame decoding thing does. My best guess is frames are delivered out of order, and gst's plugin doesn't handle that well.
It ultimately results in a segfault in core::ops::function::impls::{impl#3}::call_mut<(), dyn core::ops::function::FnMut<(), Output=()>>
according to GDB. Not very helpful.
Log if you're curious: rsdav1ddeccrash.txt
Edit: Actually got the backtrace, a bit more info there:
#0 0x00007ffff430918b in core::ops::function::impls::{impl#3}::call_mut<(), dyn core::ops::function::FnMut<(), Output=()>> (self=<optimized out>, args=())
at /rustc/1.58.1/library/core/src/ops/function.rs:269
#1 0x00007ffff4308d7a in dav1d::release_wrapped_data (_data=<optimized out>, cookie=0x1010) at /mnt/ext/oem/valve/video/dav1d-rs/src/lib.rs:39
#2 0x00007ffff4237c04 in () at /usr/lib/libdav1d.so.6
#3 0x00007ffff420f2b6 in () at /usr/lib/libdav1d.so.6
#4 0x00007ffff4224059 in () at /usr/lib/libdav1d.so.6
#5 0x00007ffff4269041 in () at /usr/lib/libdav1d.so.6
#6 0x00007ffff7638259 in start_thread () at /usr/lib/libpthread.so.0
#7 0x00007ffff78e85e3 in clone () at /usr/lib/libc.so.6
Yeah I think I mentioned that when reviewing the plugin's code and it was deferred to later because "it never happens". Once this here is merged, can you create an MR with your patch above and then we can discuss the problem over there.
Your backtrace looks like the wrapped data is freed twice or something like that.
Actually that shouldn't crash anywhere there, so would most likely be a bug in dav1d-rs
somewhere
I think I know what the problem is, and it's indeed in dav1d-rs
if I'm right.
After reviewing upstream changes, I've changed the strategy here a bit.
The changed API would become version 0.10. My suggestion would be to not worry about that for now. Once 0.10 is out we can make a new dav1d-rs release that only targets 0.10 with the new API.
What do you think @lu-zero ?
Breaking this part out of review for wider discussion:
You might also want to add setters for the other simple fields (i.e. not allocator and logger)
I'm actually tempted to go in the other direction. If a user needs the full Dav1dSettings configuration, they can use the -sys
interface instead of the simple interface.
Given that the max_frame_delay bug has been fixed, I propose we only expose max_tile_threads
(and later, n_threads
) via get/setters. If other fields are needed in the future, an API for them can be designed and added then.
I'm actually tempted to go in the other direction. If a user needs the full Dav1dSettings configuration, they can use the
-sys
interface instead of the simple interface.
The user has no access to the -sys
struct though, there's no interop between the two currently.
What problem do you see with exposing all (meaningful) settings?
Also ideally users shouldn't have to touch any unsafe code if it can be avoided.
The user has no access to the -sys struct though, there's no interop between the two currently.
Right. I'm saying users should use the -sys interface if they want features that the simple interface doesn't provide (or they can enhance the simple interface with a well-designed API, of course). Otherwise why do we even have a simple API?
What problem do you see with exposing all (meaningful) settings?
Mostly because the upstream API is unstable and IMO dav1d-rs can do better than that. For example, we can reasonably implement a max_threads
function on both the upstream 0.9 and 0.10 APIs, with no changes to dav1d-rs's users' code. We'd have to design and implement similar stable APIs for each of the other settings, which is significant effort for possibly zero real usage.
Of course we can expose dav1d-rs's users to upstream's instability, but IMO that's rude to dav1d-rs's users. Libraries and their users are a 1-to-many relationship. It makes sense to do most of the compatibility work on the library side, rather than every client project having to do the same work.
Otherwise why do we even have a simple API?
s/simple/safe/
, and so that users don't have to deal with memory management, which is especially strange and difficult in case of dav1d.
At this point dav1d-rs
is not really a simple/simplified API but a safe Rust version of the plain dav1d
C API.
Mostly because the upstream API is unstable and IMO dav1d-rs can do better than that.
It's following semver or not? In my experience it's a lost cause to try providing a compatibility layer on top of a settings API that evolves. For simple things that will work but at some point you'll end up with settings that can't be properly mapped between versions.
After reviewing upstream changes, I've changed the strategy here a bit.
The changed API would become version 0.10. My suggestion would be to not worry about that for now. Once 0.10 is out we can make a new dav1d-rs release that only targets 0.10 with the new API.
What do you think @lu-zero ?
I think we can get this in the next release for the bindings targeting 0.9
and then do a larger iteration once 0.10
/ 1.0
is out.
Of course we can expose dav1d-rs's users to upstream's instability, but IMO that's rude to dav1d-rs's users. Libraries and their users are a 1-to-many relationship. It makes sense to do most of the compatibility work on the library side, rather than every client project having to do the same work.
I assume dav1d follows semver so 0.8 -> 0.9 IS a breaking point and you should safely expect things to be incompatible.
Once 1.0
is out until 2.0
it would be a bug if something is API incompatible.
semver doesn't solve the problem, it just communicates to a library's users that they have new work to do. The point is this work for users can be avoided by good library API design and maintainership.
Anyway I'm not going to win this battle here. I will iterate.
One last nit: I'm a little uncomfortable leaving the functions undocumented. However, dav1d-rs doesn't contain any other meaningful documentation, so I decided to leave them undocumented for consistency. Such a simple library can be studied by the source, of course.
FWIW, I've added documentation to all public API now.
Thanks again. Pushed a new version rebased on to current master, which I think addresses all feedback.
@lu-zero Not sure why the CI is failing. Those struct fields exist in 0.9.2 (and all 0.9 versions), and that's what the CI seems to be using. Any ideas?
No, it pulls git master. That commit should fix the CI.
Hello. We're using gst-plugins-rs's dav1ddec plugin to decode AV1 video. I noticed that dav1d by default does not decode tiles in parallel, but can be requested to do so by setting Dav1dSettings.n_tile_threads to a higher value. This is currently impossible with the simple dav1d-rs API.
This patch sequence exports the Dav1dSettings struct publicly, allowing clients to override default settings and set this and other values as they wish. It also changes the default Dav1dDecoder::new() constructor to initialize this value to the number of CPUs, clamped to [1,4], as VLC does (see below).
It's an RFC because I'm unsure about some things:
new_default_settings
function I wrote?VLC's n_tile_threads initialization, from https://code.videolan.org/videolan/vlc/-/blob/3.0.16/modules/codec/dav1d.c#L286 :