Open quinor opened 1 year ago
We have a performance issue with the current implementation of the views, wihch produce bad assembly code. This issue is under investigation.
That's unfortunate :/ the views operations are majority of what I'm currently doing...
We have a performance issue with the current implementation of the views, wihch produce bad assembly code. This issue is under investigation.
Hi, I don't know if this is relevant to this issue but a simple transpose on a { 37, 8400 }
took 10ms
int boxRows = 37;
int boxCols = 8400;
std::vector<int> boxOutputArrShape = { boxRows, boxCols };
xt::xarray<float> boxOutputArr = xt::adapt(boxOutputFloatBuffer,
boxRows * boxCols,
xt::no_ownership(),
boxOutputArrShape);
// This took a good 10ms running on iPhone 12 Pro while the equivalent in OpenCV took less than 1ms
xt::xarray<float> predictionsArr = xt::transpose(boxOutputArr);
Is there anything wrong with the above snippet? Or if this is an xt::view
issue, is there any version that I can downgrade to to make it faster?
I'm trying to replicate this numpy function in xtensor:
My attempt:
The python version takes around 35ms to evaluate, while the c++ version runs in around 300ms (both for 2048**2 input size). This is already after I tried to optimize the code quite a lot.
Why is the C++ version slower and how do I bring it up to speed? Relevant c++ compilation options command: