luxonis / XLink

A cross-platform library for communicating with devices over various physical links.
Apache License 2.0
12 stars 18 forks source link

vidpid->state lookup container improvement; changing to `std::array` #65

Closed diablodale closed 1 year ago

diablodale commented 1 year ago

While refactoring XLink using libusb wrappers, I saw that the vidpid -> xlink state lookup is implemented as std::unordered_map vidPidToDeviceState. For XLink's use case, this is a poor choice as it uses more memory, more binary size, and more cpu than a simple solution with std::array.

I wanted to learn/play with some new ai and benchmarking tools. They all agree to switch to std::array. I am changing it in my refactor.

Setup

Repro

First found with code review. godbolts and benchmarks below.

For this use-case of fixed never-changing elements and very small size, std::array is the best choice.

At https://godbolt.org/z/KM3E9Gdfd is code for both today's std::ordered_map and a new impl with std::array. You can chose which you want with -DUSE_UNMAP or -DUSE_ARRAY. After it compiles, notice several things

Result

When benchmarked https://quick-bench.com/q/CNuAH-ebWBdson0Epl-F--C3ZXI the array implementation is 3.8x faster than map. image

Slightly altered code to generate random vidpids show array still faster (1.3x faster) but not as dramatic. https://quick-bench.com/q/orjPkib5xLpID0nitgWJc-VowTY. I think this is due to the addional load of generating random numbers and that since a fixed search vidpid isn't known at compile time, then gcc can't optimize as aggressively. image

The speed increase is not really significant. If that were the only thing I would say this is microoptimization ;-) However, the file size is significant. Saving 400kb in filesize justifies the trivial change to std::array.

themarpe commented 1 year ago

The speed increase is not really significant. If that were the only thing I would say this is microoptimization ;-) However, the file size is significant. Saving 400kb in filesize justifies the trivial change to std::array.

Agree on the conclusion - though WRT file size, what is that measuring, final compiled binary?

diablodale commented 1 year ago

Yes final -O2 compiled binary size on disk. You can see it in the godbolt links. It is in tiny text in the compiler output pane. I do also see unordered_map being used for the createPlatformDeviceFdKey() map store. The file savings might not be as large since some of it is (maybe?) shared.

The createPlatformDeviceFdKey map itself is a area that I'll investigate within just a few days as it involves my c++ work in #63. This is one of the C <-> C++ boundaries that I want to handle well. During that investigation, I'll explore how large that map becomes and how often it is mutated....to understand what container is best used for it. While the USB scenario isn't so many (only so many ports usually on a computer), the PoE scenario could be up to MAX_LINKS which is currently 64.

themarpe commented 1 year ago

Yes final -O2 compiled binary size on disk. You can see it in the godbolt links. It is in tiny text in the compiler output pane. I do also see unordered_map being used for the createPlatformDeviceFdKey() map store. The file savings might not be as large since some of it is (maybe?) shared.

Got it - yeah, in that case would be intermixed with other uses

The createPlatformDeviceFdKey map itself is a area that I'll investigate within just a few days as it involves my c++ work in https://github.com/luxonis/XLink/pull/63. This is one of the C <-> C++ boundaries that I want to handle well. During that investigation, I'll explore how large that map becomes and how often it is mutated....to understand what container is best used for it. While the USB scenario isn't so many (only so many ports usually on a computer), the PoE scenario could be up to MAX_LINKS which is currently 64.

This is a "one time" modded structure per up/down event. And the lookups thereafter to retrieve the actual FD (per R/W event)

diablodale commented 1 year ago

This is fixed in my fork of XLink. It is faster (tests above) and smaller filesize (also above). The latter is most beneficial when xlink is a DLL instead of a static library. While this is function is not a high-volume use, these fixes are code simplification; removing unneeded complexity and filesize.

Closing as the fixes can be seen around https://github.com/diablodale/XLink/blob/636dc4142774e11f9a1d68125575f2782d65744a/src/pc/protocols/usb_host.cpp#L165