Closed SwamyDev closed 1 year ago
Awesome! Nice job.
which you can find in the linked pull request.
Sorry, I don't see a link anywhere.
I don't use segments
usually so forgive me if I'm misunderstanding, but would an alternative solution be to pre-allocate the final target array no matter what and fill it in with results? Then if a user wanted to provide the array that was being filled in they could provide an out
keyword argument to resemble the standard practice in numpy?
Ah there's the PR. Thanks.
Awesome! Nice job.
Thanks :)
which you can find in the linked pull request.
Sorry, I don't see a link anywhere.
You were apparently way to fast for me ;)
I don't use
segments
usually so forgive me if I'm misunderstanding, but would an alternative solution be to pre-allocate the final target array no matter what and fill it in with results? Then if a user wanted to provide the array that was being filled in they could provide anout
keyword argument to resemble the standard practice in numpy?
Well there are cases where the index and distance arrays are much smaller then predicted by the target area, i.e. when working with masked data. At first I did exactly that, but one of your tests immediately caught me. One could pre-allocate the array with the output size preemptively, and then trim everything at the end again to avoid this issue, but that might increase memory usage significantly for some of your users and break their code.
I haven't thought of the out
keyword option, however, to avoid a new API function. It does indeed resemble standard practice of numpy which is probably be a good idea to follow. I'm just not a fan of this setup, because it is not very expressive and it changes the return behaviour of the function.
Code Sample, a minimal, complete, and verifiable piece of code
Problem description
I am working on a software application that deals with large raster files (approximately 15000x15000 pixels). Recently, I encountered an issue where the
get_neighbour_info
function was significantly slowing down, when I used multiple segments to fit the kd_tree lookups into RAM. The code snipped provided is a simplified benchmark demonstrating the issue. Upon profiling the code usingpy-spy
, I discovered that the concatenation of segments was the cause of the slow down:As you can see, the current implementation results in many copies of the segments, which would even result in a peak RAM usage of 2 * total_size - segment_size.
Expected Output
To address this, a solution would involve pre-allocation of result arrays to avoid memcpys. Thanks to your habitable code base and excellent test coverage, I was able to quickly draft up a solution proposal, which you can find in the linked pull request. By pre-allocating the result arrays, most time is spent in the different workers querying the kd_tree, instead of copying data:
Also the execution time went down from ~35min to just ~4min.
Associated pull request
In this first draft I've opted for adding a simple pre-allocation size parameter to
get_neighbour_info
, to avoid increasing the surface of the public API. However, another option would be to introduce a new method likeput_neighbour_info
where the results are not returned, but provided as output parameters. This would give users even more flexibility such as specifying the result data type. However this comes at the cost of another public API function, and probably some significant refactoring ofget_neighbour_info
to avoid code duplication. What do you think?