Closed philipbelesky closed 4 years ago
Looking forward to seeing this implemented! 50ms sounds like a relatively big overhead for what we are doing. Essentially it is the same logic as in the GaussianBlurProcessor -> iterate over the entire array of pixels and define a value for each of them based on their neighbors. Blurring takes 1-2ms, we should be able to achieve a similar execution time with slope as well.
Maybe @Sergio0694 has some ideas on how to approach this.
Hey @mariuszhermansdorfer - I'd be happy to help but I'm not 100% sure about what you're trying to do here. A couple small tweaks right off the bat, from what I'm seeing:
foreach
loops with for
loops (saves creating an iterator)double
variables with float
, and use MathF
instead of Math
, if you don't need the extra 64 bit precisionOther than that, what kind of effect are you trying to apply here? If it's a 2D convolution with some kernel that is linearly separable, then yeah you can replicate my gaussian blur effect by using two 1D convolutions instead of one. This of course gets the computation cost down from O(n^2)
to O(n)
.
If instead you're doing some other additional work on the surrounding pixels, it might be trickier to optimize.
Thanks @Sergio0694! Sorry for not providing a clearer explanation of what we are trying to accomplish. The goal is to visualize slope of the terrains created in the sandbox. To achieve that, we need to calculate a value for each pixel and store it in an array. This is then passed to our meshing function as vertexColors information.
Slope is defined as rise/run
. rise
is equal to the difference in elevation and run
is the distance in the XY plane between points. Since we operate on a defined grid of pixels run
is constant (its three constants to be precise, one for the X, one for the Y, and for the XY direction). Since each pixel can have up to 8 neighbors, we calculate an average of up to 8 individual comparisons and assign this value to the analyzed pixel.
This logic has to be repeated for every single one of our 217088 pixels.
Does this sound like a good candidate for linear separation?
I'm sorry, I still don't really understand the exact operation you're describing.
1) What is the type of the source image? Is it just an array of numeric values (eg. int
), or is something like an array of RGB vectors?
2) "rise
is equal to the difference in elevation" - elevation between what?
Anyway, if the total area being processed for a single pixel is just the 3x3 square centered in that pixel, even if you could come up with a linearly separable kernel to represent that operation it likely wouldn't bring a performance increase, as the kernel is so small already. What I suggest is to: 1) Parallelize the loop on a per-row basis, just like I did in my gaussian blur 2) Unroll the loop and make all the possible values constants.
As for point 2, as an example: imagine that for each pixel you were just summing its value and the one of the 2 following pixels, if within bounds (completely made up example, bear with me).
Classic implementation (with no optimizations):
const int width = 640, height = 380;
int[] image = new int[width * height];
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
int sum = 0;
for (int i = 0; i < 3; i++)
{
if (x + i == width) break;
sum += image[y * width + x + i];
}
image[y * width + x] = sum;
}
}
Unrolled implementation (with no other optimizations):
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
int sum = image[y * width + x];
if (x + 1 < width) sum += image[y * width + x + 1];
if (x + 2 < width) sum += image[y * width + x + 2];
image[y * width + x] = sum;
}
}
Note though that the second is more of a micro-optimization, so I'd suggest to just making the code run in parallel, that might very well be enough for you already. Following up from the previous example, the parallelized version could look something like this:
Parallel.For(0, height, y =>
{
for (int x = 0; x < width; x++)
{
int sum = image[y * width + x];
if (x + 1 < width) sum += image[y * width + x + 1];
if (x + 2 < width) sum += image[y * width + x + 2];
image[y * width + x] = sum;
}
});
Hope this helps!
P.S.: bonus example if you really want to squeeze out as much performance as possible:
Parallel.For(0, height, y =>
{
ref int row = ref image[y * width];
for (int x = 0; x < width; x++)
{
int sum = 0;
if (x + 1 < width) sum += Unsafe.Add(ref row, x + 1);
if (x + 2 < width) sum += Unsafe.Add(ref row, x + 2);
Unsafe.Add(ref row, x) += sum;
}
});
But again, first just make the code run in parallel 👍
Thanks, @Sergio0694! To answer your questions:
double[]
, which is a result of the Gaussian blurring function you wrote for us earlier.rise
is a difference between two values in the above double[].
It really is as simple as that.Following your suggestion I wrote a very verbose slopeCalculator. It first takes care of the edge cases for individual pixels in the corners of the image (NW, NE, SW, SE), rows (first and last) and columns (first and last) and then iterates over the main array. Results are pretty satisfying, but also quite surprising. @philipbelesky, I went with a D8 approach just to see how much of an overhead it would cause.
Sequential for loop - 2ms
Parallel for loop - 14 ms
I suspect it is because the logic needs to read pixel values from other rows which may be locked by other threads writing to it.
Any ideas on how to optimize it from there? Maybe some of your unsafe
mojo would help :)
@mariuszhermansdorfer So, by quickly looking at your code, a few suggestions come to mind:
1) Absolutely swap your double
for float
values. The loss in precision shouldn't really be noticeable at all, but it should speed up calculations by a fair bit. I mean to replace all double
values, so both in the gaussian blur processor and in this new code your wrote.
2) After you're done with the previous point, remember to replace all your Math
calls with MathF
calls (same exact methods, just working on float
values and not double
3) Especially if this code you wrote is being called repeatedly (possibly multiple times a second), do this change in this line:
// Replace this
ushort[] slopeValues = new ushort[width * height];
// With this
ushort[] slopeValues = ArrayPool<ushort>.Shared.Rent(width * height);
ArrayPool<T>
(see this blog post for more details, and the docs page here) is a high-performance pool of arrays, which makes it so that you can reuse arrays instead of forcing the runtime to allocate a new one every time. Just be wary of a couple caveats:
ArrayPool<T>.Shared.Return(T[] array)
.
Length
property of your array (since you're always iterating with your previous width
and height
values, that should be taken care of already
4) At this line you're accessing a variable from an outer scope, within a parallel loop. Since you only need a temp variable within that parallel loop code, avoid doing that and just declare another local variable for that method block, and use it from there.After you're done with all of these I might also do a pass replacing those array accesses with Unsafe.Add
accesses, but that's a micro-optimizations which might not be that necessary here.
Hope this helps!
@Sergio0694, you're awesome! Point number 4 made an enormous difference! Localizing the variable took it down to less than 1 ms. Most satisfying :)
I will implement the rest of your suggestions tomorrow to make it even snappier. I have two questions, though:
MathF
is part of the System
namespace. It isn't recognized, though. Am I missing a reference? Also, since it is not the first time I run into this issue, where can I find this information in the future. It seems to be missing in the docs.@mariuszhermansdorfer Ahahah thanks, happy to help! 😊 It's great that you got such a performance improvement just by changing that variable ahahah
As for your questions:
1) My bad, I just double checked the docs and apparently MathF
is not available for .NET Framework, but just for .NET Core. Unless you move your whole project to .NET Core 3.0 (which would also be faster in general, though you'd have to test to see if all the packages you're using support it), unfortunately you won't be able to use that API.
2) No, you can't return it from that method, otherwise the caller would receive an invalid array. The way you'd structure that is the following:
// Somewhere in your code when you need this
var slopes = SlopeCalculator.CalculateSlope(...);
// Do stuff with slopes...
ArrayPool<double>.Shared.Return(slopes);
That is, return that array when you're sure you won't be using it again. For instance, if you're getting that array within some other method, which is using it to do some other work and then just discarding it (ie. not returning it or saving it somewhere else), you could just return that array to the pool at the end of that method.
Thanks also for your help @Sergio0694! I think we are stuck on .NET 4.5 as this is what our host environment provides, although it is worth testing if we can get around that.
@mariuszhermansdorfer the new CalculateSlope
function shifts the coloring process outside of the per-pixel loop and operates with the entire point cloud as the input rather than on a per-point basis. To integrate this approach with the existing analysis classes, it looks like GetPixelIndexForAnalysis
should be replaced with a method along the lines of GetPixelColorsForAnalysis
that can can accomodate this 'all at once' approach.
We have a few different branches floating around that should probably be resolved first before pursuing that option. Is it OK to merge this branch in? I can then merge/integrate the new setup component, and then rework the analysis classes to integrate the method from performance/slope_parallel
?
@philipbelesky, yes, please go ahead with the merging. I wasn't pushing any other commits to the master just to make sure I don't make your life harder with integrating the analysis part.
Thanks for clarifying things, @Sergio0694. As far as I can understand, we are stuck with .NET 4.5 unless we explicitly make our users install newer versions of the framework.
I must admit, I don't understand this explanation fully, but it seems we can't use the .NET Core for this particular project.
No problem! As for .NET Core 3.0, if these changes have already brought the time it takes to perform this section of the pipeline down to below 1ms I'd say you can just forget about that then.
You might just want to throw in that ArrayPool<T>
optimization just if you want to really go all the way, but if you're at 1ms already that might not even be noticeable anyway.
Is this one ready to be merged into master @philipbelesky? Feel free to do it when you think we're good to go.
Great, done. I’ll have some time later this week to integrate the new structure of the slope algorithm and will try to replicate it for aspect at the same time.
Hey, this is just a tracking PR for some of the ongoing changes that cut across #30 #33 #13; will post highlights as I go and signal when a full review is ready. As of now:
MeshGeometryAnalysis
class of analyses to output geometry after the point cloud/mesh items have been made; e.g. for the water plane or contouring.