feature request: pluggable allocator for data buffers and/or deep integration with cv::UMat (not Mat)

diablodale commented 3 years ago

Request the buffers that hold depthai output data that is available to user code to be aligned and also be allocated by the user's choice of allocator. Why?

alignment is good. And necessary for some libraries, CPU ops, and good integration with devices using standards like OpenCL.
memory copies are bad, and double bad when double-copied. I one copy is USB->RawBuffer. Moving that data to an OpenCL/GL device using mainline OpenCV can increase that to 2 or 3 copies with the existing DepthAI approach.

During a super-quick code review, seems RawBuffer.hpp is the struct that holds data within things like ImgFrame. RawBuffer uses an unadorned std::vector<std::uint8_t>. No alignment. No capability for a pluggable allocator.

Example use cases

Eigen, the widely used C++ template library for linear algebra, performs best on aligned data. If the user of depthai can plug in an allocator or somehow specify the needed alignment, then all the data would be saved aligned into RawBuffer and then can be given to Eigen for its highly optimized SIMD codepaths.
In my specific use, I would need tight integration with cv::UMat. And for the UMat to allocate the needed memory and provide that memory address for the initial copy USB->RawBuffer. This results in one copy total of that data from USB->iGPU due to my UMat improvements in OpenCV's OpenCL.

diablodale commented 2 years ago

If/until the depthai SDK supports the OP, I have implemented alignment using

// inside RawBuffer.hpp
template <typename T, int ALIGN_BYTES = 64>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, ALIGN_BYTES>>;

and then cascade that type into the needed places for images (the only things I need aligned at this time) Function affected are Buffer::get/setData(), dai::parseDatatype(), and StreamMessageParser::parseMessageToADatatype(

Since this aligned vector is a different type than the default std::vector<T, std::allocator<T>> it cascades some incompatibilities in examples, test, and the included OpenCV frame functions. They are all easy to update.

This does not address heap being thrashed (since no pools of these same sized image std::vector), and ownership/thread safety continues to be an issue #257. If this feature request is ever going to be added, I recommend doing it at the same time as addressing issue #257 since they are the same codepath/classes.

diablodale commented 2 years ago

one approach to add alignment via an agnostic RawBuffer is at https://github.com/diablodale/depthai-core/tree/fix257-move-owner-threads

diablodale commented 1 year ago

@themarpe hi. I've returned from travels and ready to contribute more. I'd like to submit a PR to add alignment to the underlying RawBuffer using the boost::alignment::aligned_allocator. I see in the v2.19 more allowance for Boost was added, so perhaps now is a good time to add this.

I and my customers have been using the alignment for a year with no issues.

themarpe commented 1 year ago

Hi @diablodale

rvc3_develop has already brought in changes required to refactor messages /w zerocopy and custom data allocation (that latter will have to be done in XLink).

Feel free to check that wrt to such data storage differences.

Note: RawX -> XMetadata, ADatatype and descendatnts -> Message. Thats the nomenclature behind the change - so ImgFrame is combination of data + ImgFrameMetadata (at the moment named RawImgFrame).

We could backport to current develop if need may be or wait for rvc3 to be merged

I see in the v2.19 more allowance for Boost was added, so perhaps now is a good time to add this.

I would steer away from bringing in boost into the main codebase - thought with the above approach you should be able to add it in your own codebase. + I still think that modifying XLink's allocator would be best then, as it also gives you zero copy

diablodale commented 1 year ago

Got it. I was unaware of active current progress on the aligned+zerocopy. I prefer not to dup effort given rvc3_develop is happening. Therefore, I will not submit a PR to use boost's aligned vector.

I'll continue to use my approach privately. If anyone wants, it is only 4 lines of well tested code. https://github.com/diablodale/depthai-shared/commit/2d396d134219ad11c8270a2da2c25b05f01b5606

i'll keep this issue open to track the alignment feature request with hope rvc3 eventually delivers it for the general public

luxonis / depthai-core

feature request: pluggable allocator for data buffers and/or deep integration with cv::UMat (not Mat) #253

Example use cases