Closed diablodale closed 1 year ago
The speed increase is not really significant. If that were the only thing I would say this is microoptimization ;-) However, the file size is significant. Saving 400kb in filesize justifies the trivial change to std::array.
Agree on the conclusion - though WRT file size, what is that measuring, final compiled binary?
Yes final -O2
compiled binary size on disk. You can see it in the godbolt links. It is in tiny text in the compiler output pane.
I do also see unordered_map
being used for the createPlatformDeviceFdKey()
map store. The file savings might not be as large since some of it is (maybe?) shared.
The createPlatformDeviceFdKey
map itself is a area that I'll investigate within just a few days as it involves my c++ work in #63. This is one of the C <-> C++ boundaries that I want to handle well. During that investigation, I'll explore how large that map becomes and how often it is mutated....to understand what container is best used for it. While the USB scenario isn't so many (only so many ports usually on a computer), the PoE scenario could be up to MAX_LINKS
which is currently 64.
Yes final -O2 compiled binary size on disk. You can see it in the godbolt links. It is in tiny text in the compiler output pane. I do also see unordered_map being used for the createPlatformDeviceFdKey() map store. The file savings might not be as large since some of it is (maybe?) shared.
Got it - yeah, in that case would be intermixed with other uses
The createPlatformDeviceFdKey map itself is a area that I'll investigate within just a few days as it involves my c++ work in https://github.com/luxonis/XLink/pull/63. This is one of the C <-> C++ boundaries that I want to handle well. During that investigation, I'll explore how large that map becomes and how often it is mutated....to understand what container is best used for it. While the USB scenario isn't so many (only so many ports usually on a computer), the PoE scenario could be up to MAX_LINKS which is currently 64.
This is a "one time" modded structure per up/down event. And the lookups thereafter to retrieve the actual FD (per R/W event)
This is fixed in my fork of XLink. It is faster (tests above) and smaller filesize (also above). The latter is most beneficial when xlink is a DLL instead of a static library. While this is function is not a high-volume use, these fixes are code simplification; removing unneeded complexity and filesize.
Closing as the fixes can be seen around https://github.com/diablodale/XLink/blob/636dc4142774e11f9a1d68125575f2782d65744a/src/pc/protocols/usb_host.cpp#L165
While refactoring XLink using libusb wrappers, I saw that the vidpid -> xlink state lookup is implemented as
std::unordered_map vidPidToDeviceState
. For XLink's use case, this is a poor choice as it uses more memory, more binary size, and more cpu than a simple solution withstd::array
.I wanted to learn/play with some new ai and benchmarking tools. They all agree to switch to
std::array
. I am changing it in my refactor.Setup
Repro
First found with code review. godbolts and benchmarks below.
For this use-case of fixed never-changing elements and very small size,
std::array
is the best choice.At https://godbolt.org/z/KM3E9Gdfd is code for both today's
std::ordered_map
and a new impl withstd::array
. You can chose which you want with-DUSE_UNMAP
or-DUSE_ARRAY
. After it compiles, notice several thingsResult
array = 95,505
andunordered_map = 497,156
. The map is 4x larger. (additional 400kb) in filesize 😥. All that extra code for nothing since thismap
is never mutated and t-i-n-y.When benchmarked https://quick-bench.com/q/CNuAH-ebWBdson0Epl-F--C3ZXI the
array
implementation is 3.8x faster thanmap
.Slightly altered code to generate random vidpids show array still faster (1.3x faster) but not as dramatic. https://quick-bench.com/q/orjPkib5xLpID0nitgWJc-VowTY. I think this is due to the addional load of generating random numbers and that since a fixed search vidpid isn't known at compile time, then gcc can't optimize as aggressively.
The speed increase is not really significant. If that were the only thing I would say this is microoptimization ;-) However, the file size is significant. Saving 400kb in filesize justifies the trivial change to
std::array
.