Closed andmax closed 4 years ago
My code only runs for squared images of sizes of multiple of 32. There are transpose operations in the process, this is for efficient memory access. And 32 is the size of a CUDA warp. It's possible to modify the code for availability for 2D images of size $32n \times 32m$ in theory, but needs to do complicated modifications. For pba3D, it actually can be viewed as many layers of 2D images. You can try to modify it to run for $32n \times 32n \times m$. Related functions for the $m$ can be found here: https://github.com/orzzzjq/Parallel-Banding-Algorithm-plus/blob/2f58720302c3478d4367b21268dfbb263b46c4d2/pba-plus-3D/pba/pba3DHost.cu#L94-L101
For non-squared images, you can put it into a squared image of size $32n \times 32n$. I believe the performance won't be worse than directly run for the original size if the shape of the object is not too narrow. Related questions: https://github.com/orzzzjq/Parallel-Banding-Algorithm-plus/issues/2#issuecomment-635105467
Hi, thanks for your answer and comments. Yes, I was able to run for any squared image size, by doing several dimension checks in kernels, but remaining with the square restriction due to the transpositions. That is ok, thank you.
Is there a way to run your PBA+ algorithm with dimension sizes that are not multiple of 32? In addition, for pba3D, it seems that the size of dimension X and Y need to be equal. Do you have any hint of how to change your code to support any dimension sizes?
Thanks, Andre.