morousg / cvGPUSpeedup

A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
Apache License 2.0
34 stars 5 forks source link

Add reading and writing with bigger data types #26

Closed morousg closed 8 months ago

morousg commented 1 year ago

Make an optional optimization, where each thread reads element sizes that are 2x or 4x the size of the real data element size.

This should be aplicable to data types that are multiple of 2. May we do it for data types of size 3 at some point.

morousg commented 9 months ago

Things pending to do:

  1. The flag to activate the thread fusion has to be in the GridPattern.
  2. For the support of only first read and last write operations, we need to find a way to have the same number of elements per thread in both the read and the write operations.
  3. We need to either detect that there is a MidWriteOperation and disable the thread fusion, or add support to apply the thread fusion also to the MidWriteOperation.
  4. What happens if one of the parameters it's a matrix, and we need to read each of the elements of that matrix to multiply one on one with each of the elements of the other matrix.