The painting method is currently very rough and could see a number of optimizations to speed it up by a good factor. Here is a rough plan:
Add support for GPU time queries
Set up a benchmarking scene. We want to know, roughly, given a fixed camera and the bar count, how much GPU time is spent per 1M of the bars (on average).
Process bars in the pairs of 2, so that each pair of pixels produces exactly 2 bars unconditionally.
Rotate the map so that the metadata for 2 points can be read using a single texture fetch.
Merge height and metadata into one RGBA8Uint texture - #5.
I believe doing this transform will have enormous effect on performance. Not only we'll take 1 texture fetch instead of 4 for a pair of texels, but also we'll not need dependent texture fetches any more.
The painting method is currently very rough and could see a number of optimizations to speed it up by a good factor. Here is a rough plan: