Closed vadixidav closed 3 years ago
I've got a bit of a speedup by moving the reallocate branch out to its own function and marking it as cold
, as RawVec does. I'm not sure what to do next.
(I tried using old_len == old_cap instead of new_len > old_cap, but this seemed to do nothing)
The only other difference I can see in RawVec is that they use alloc::Allocator (the default global, presumably) ::grow() instead of alloc::realloc(). Also something where they very deliberately state assumptions about the layouts. realloc is to be deprecated, so it's possible it's not being maintained very much?
Interesting. I will review your PR and see if we can get that merged. We should take a look at grow() as well. If you want to do that, go for it. Right now I am somewhat preoccupied with cv-sfm and ennona, but I can look at this eventually when I have time.
Well, I tried using alloc::alloc::Global.grow() instead of realloc. No noticeable improvements https://github.com/makoConstruct/header-vec/blob/testGA/src/lib.rs#L174
We could try the layout stuff maybe -_- I think we should probably go over and ask for advice from whoever maintains RawVec, at this point.
Go for it. For me the largest improvement would actually be if I could store multiple types in separate buffers in the allocation rather than as tuples, but I am busy working on other things at a higher level in Rust CV currently which have a bigger performance impact. Don't feel that you absolutely need to tackle this one problem unless benchmarks are showing that it is important to your downstream performance. Regardless, if you keep working on it, I certainly appreciate it.
I will reach out. It probably isn't the biggest performance concern for my application, but it bugs me.
Hah, I know the feeling. Appreciate it.
deallocating after each bench iteration helps a lot: #6
We have reached 2x performance, but the remaining distance from 1x still bugs me. Something specific to TestA
as a header type is causing this.
Okay, resolved. Parity reached. The rest of the slowdown was caused by #[repr(align(128))]
on TestA. (Out of curiosity, why were you aligning that?)
Okay, resolved. Parity reached. The rest of the slowdown was caused by
#[repr(align(128))]
on TestA. (Out of curiosity, why were you aligning that?)
In bitarray
the alignment is set very high (not that high) because it uses an array that must be aligned for SIMD, but I couldn't find a good way to set the alignment properly since it uses const generics (which means the alignment would be based on the const parameter). I just wanted to make sure that alignments higher than the actual size wouldn't break the code.
It seems it can cause performance issues though, which is understandable. Thanks for hunting this stuff down.
In #2, it was identified that
HeaderVec
performs several times slower thanVec
. While this isn't too unusual, it would be great to go faster. Now that we have benchmarks (see #3), we can use these to push the performance further. Getting within at least 2x the performance ofVec
should be reasonable, especially considering thatHeaderVec
is mostly concerned with reducing random accesses in graphs, so memory latency/random lookup is typically a greater concern. However, there is no reason that the implementation can't go faster than it is today, so this should be done.This issue is to track performance increases until we get to within a reasonable margin (2x) of
Vec
. This target can be moved if necessary.