Closed crop2000 closed 2 days ago
In general, we want the fastest possible code, to avoiding runtime checks as much as possible. I understand the desire to be safe when targeting Rust, but how can the "end of buffer" be checked in an efficient manner ?
Since the compute
function is supposed to use count
frames, could this "end of buffer" using count
check be done once in the architecture file ?
This inputs0[..count]
we create a slice, which is a struct with a pointer and a length.
I expected that take
is more efficient then creating a slice.
To reason about the impact of a iterator is a bit more complicated because i don't know how it is implemented and optimized.
I created a simple example and looked at the created assembly code. in the case of using a slice a section for the panic is created and there are some other extra operations.
Initially i thought using take allows us to simplify the generated code, obviously one shouldn't generate a less performant method. I will have a look at how i can create a benchmark.
But the question here is foremost the following: do we appreciate the panic or would a silent use of a too short buffer be OK?
So the panic happens if buffer[i].len() < count
? Or what exactly is the case here?
In my opinion, if buffer[i].len() < count
is the case, i'm very fine with a panic because this is something that just shouldn't happen.
For using iterators over loops in the generated rust code, I think we would need to benchmark this. I think rustc is quite smart with auto vectorization for iterators, but googling around it seems like floats are tricky in this. But also only know little from here and there about this topic.
buffer[i].len() < count
: yes the generated code would read/write outside the buffer, so a panic seems the right behaviour here.
The take
iterator would simply stop at the last available sample. I think you agree that a decision could only be made after benchmarks have been made.
In my opinion, it is a precondition of compute
that the buffer has a sufficient size, so a panic feels right.
Looking at it from the other perspective: It is a bug, if someone calls compute
with a count
that is larger than the buffer they provide to it. A bug like that should not silently be accepted.
For using iterators over loops in the generated rust code, I think we would need to benchmark this.
We already did quite some benchmarking on this. The initial implementation of the Rust code generation was using buffer accessed like buffer[index]
. We noticed that the generated assembly did not produce vectorized code in these cases, and the Rust was slower in these Faust Rust vs C++ benchmarks. The trick to get Rust to emit vectorized assembly was to switch from indexed access to using zipped iterators, because they allow Rust to optimize away the bounds checks. The switch to zipped iterators gave the performance a huge boost. You can probably find related discussions here in the PR / ticket / git history.
I got the idea to use take as it could be added after the zipped iters.
the current implementation seem to be the most performant.
The rust code that is currently generated panics if the provided buffers are to short. Is this really what we want? Or would it be better if we iterate over the buffer as it is provided, and stop after N steps or end of buffer? i created a example here. i suggest to change:
let inputs0 = inputs0[..count as usize].iter();
tolet inputs0 = inputs0.iter().take(count);
@obsoleszenz @bluenote10