Closed yjhmelody closed 9 months ago
Thanks for the PR!
So a few points:
elf.rs
doesn't matter as it's not used by the VM itself, so there's little point in optimizing it, so that should be reverted. (:unsafe
you introduced should be completely unnecessary as the compiler should have been able to figure that out by itself, so that should be reverted too.read_byte
, if it speeds things up. (: So have you actually measured a difference?The only part which might make sense is the read_byte, if it speeds things up. (: So have you actually measured a difference?
I did not measure it. Just take the std source:
unsafe impl<T> SliceIndex<[T]> for ops::Range<usize> {
type Output = [T];
#[inline]
fn get(self, slice: &[T]) -> Option<&[T]> {
if self.start > self.end || self.end > slice.len() {
None
} else {
// SAFETY: `self` is checked to be valid and in bounds above.
unsafe { Some(&*self.get_unchecked(slice)) }
}
}
//...
}
unsafe impl<T> SliceIndex<[T]> for usize {
type Output = T;
#[inline]
fn get(self, slice: &[T]) -> Option<&T> {
// SAFETY: `self` is checked to be in bounds.
if self < slice.len() { unsafe { Some(&*self.get_unchecked(slice)) } } else { None }
}
//...
}
For most cases, Range checking need to check two bounds.
Maybe the start
checking can be directly optimized by compiler, but I generally don't assume that.
I wonder if compiler will unroll the loop
for nth_arg in 0..arg_count
In general if the compiler knows that a slice is e.g. at least 1 element long then doing slice[0]
will not have a bounds check anymore.
Anyway, now we do have compilation time benchmarks (see tools/benchtool
) so any potential optimization will have to show that it's actually faster to get merged.
Okay, so I'll close this for now. Feel free to reopen and/or make another PR if you can speed things up and can show the speedup in the benchmarks. (:
accessing array with range index is expensive than with position index.