These changes feature the return of the dotted outlining of arbitrary selections. On the more basic side, the "BufferArray" object was rewritten to focus exclusively on VAOs (Vertex Array Objects), and a more pure "BufferObject" was created that can handle OpenGL buffer objects in a more generic sense. The bottom line is that when generating a collection of the selected-texel point coordinates, the compute shaders write vertex information into an SSBO (Shader Storage Buffer Object). The underlying buffer object here is no different than any other buffer object, so the VAO needs to be able to switch from looking at the traditional VBO (Vertex Buffer Object) that it is has by default, to then look at the SSBO data and treat it like vertex information.
As for the main feature of this PR, selection outlines have made a return. The first major push of this change was implementing a geometry shader that converts a theoretical coordinate of a selected texel into line vertices for where the outlines should be placed. This was a great improvement, but what was left (99% of the slowness) was improving the way those texel coordinates were emitted into the rendering pipeline in the first place. We first made a CPU parallel algorithm that operates (over a selection texture with binary information on whether a texel is selected) in 3 steps:
Each worker looks at their respective chunk of the selection texture and totals how many selected texels they see, each placing their answer in an array, indexed corresponding to their worker ID.
Performing a prefix sum operation over this array (with slight modification) to determine the indices for where each worker should start writing the final answer in the final step
Each worker looks again through their respective chunks, recording the (X, Y) texel coordinates of selected texels into the final answer array, indexed according to result from the previous step
This implementation was very good in comparison to the old sequential approach, where one thread would simply loop over an entire set containing the selections and populate a fixed array to send to the rendering pipeline. In our testing, we saw a 105%-107% speedup in extreme cases. The only problem with this approach is that it doesn't take advantage of the GPU - all of these calculations rely on the parallelism in the CPU, which is very limited. So, mainly wanting to grow our OpenGL knowledge, we implemented the same algorithm in compute shaders. The only issue was that the prefix sum step was incredibly slow to run on a single worker sequentially, so I had to implement a rather messy parallel prefix sum algorithm in another compute shader. (I don't know how to do it better with the limitations of GLSL - it plagued all of the shaders with 2 copies of essentially the same thing). The result is a lightning fast algorithm that in small cases is similar to the original CPU parallel solution, but in extreme cases can even get up to 10x faster. The run time depends heavily on how the chunkSize is set - right now it is hard coded at 256, but could be improved by intelligently deciding what it should be for a given image.
These changes feature the return of the dotted outlining of arbitrary selections. On the more basic side, the "BufferArray" object was rewritten to focus exclusively on VAOs (Vertex Array Objects), and a more pure "BufferObject" was created that can handle OpenGL buffer objects in a more generic sense. The bottom line is that when generating a collection of the selected-texel point coordinates, the compute shaders write vertex information into an SSBO (Shader Storage Buffer Object). The underlying buffer object here is no different than any other buffer object, so the VAO needs to be able to switch from looking at the traditional VBO (Vertex Buffer Object) that it is has by default, to then look at the SSBO data and treat it like vertex information.
As for the main feature of this PR, selection outlines have made a return. The first major push of this change was implementing a geometry shader that converts a theoretical coordinate of a selected texel into line vertices for where the outlines should be placed. This was a great improvement, but what was left (99% of the slowness) was improving the way those texel coordinates were emitted into the rendering pipeline in the first place. We first made a CPU parallel algorithm that operates (over a selection texture with binary information on whether a texel is selected) in 3 steps:
This implementation was very good in comparison to the old sequential approach, where one thread would simply loop over an entire set containing the selections and populate a fixed array to send to the rendering pipeline. In our testing, we saw a 105%-107% speedup in extreme cases. The only problem with this approach is that it doesn't take advantage of the GPU - all of these calculations rely on the parallelism in the CPU, which is very limited. So, mainly wanting to grow our OpenGL knowledge, we implemented the same algorithm in compute shaders. The only issue was that the prefix sum step was incredibly slow to run on a single worker sequentially, so I had to implement a rather messy parallel prefix sum algorithm in another compute shader. (I don't know how to do it better with the limitations of GLSL - it plagued all of the shaders with 2 copies of essentially the same thing). The result is a lightning fast algorithm that in small cases is similar to the original CPU parallel solution, but in extreme cases can even get up to 10x faster. The run time depends heavily on how the chunkSize is set - right now it is hard coded at 256, but could be improved by intelligently deciding what it should be for a given image.
Closes #10 Closes #11 Closes #44 Closes #46