Open serjl opened 4 years ago
It is always the MinIndex-method that takes that long. NPP is mostly asynchronous, which means that the NPP function returns to host code before the actual job is done. Memory deallocation (if no buffer is provided) or copying data from device to host are implicit synchronization steps, that's where your application is waiting for your time measurements. Remains the question why it takes so long: Is your image huge in size? Is the DLL for NPP already loaded, meaning do you perform the test several times? Is only the first execution slow or is it always the case?
Thanks for a quick reply! Image's size is about 1000x30 and it runs in a loop, so gpu runs many times (it is not about the zeroth long iteration). It is always slow in every iteration.
Hi @kunzmi, Thanks again for the great wrapper. I wrote a pretty standard function for the pattern matching:
It works fine, but the !!!!!!!(PROBLEMATIC LINE )!!!!!!!!!!! is extremely slow (about 1 second or even more) (my GPU is GeForce GTX 1070). And if I don't use a buffer for MinIndex function, then the MinIndex is very slow and the !!!!!!!(PROBLEMATIC LINE )!!!!!!!!!!! is fine. In total, both ways take the same long time. Do you have an idea of the reason of such a behavior? Is it a GPU problem or I don't use the memory management correctly.
Thank you in advance, Sergei