Open sfan5 opened 6 months ago
I am currently working on a solution to this. The data structure I'm implementing is a dynamization of k-d-trees, meaning it would store O(log n) many k-d-trees with sizes of powers of two. This means:
Josiah and I tried optimizing this by using an off-the-shelf R-tree library a while ago. This failed due to bad constant factors for our medium-size workloads. It's also a bit overkill: It is centered around boxes. It would have made supporting higher variance in box dimensions easier. But currently we don't need that; we can just approximate the relatively small objects as points for now.
I'm optimistic that with this new approach, I have a good chance of getting the constant factors low enough since there are plenty of knobs to tweak:
So I took a walk and here's my idea:
Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...
For reference YL bugtracker 3723, 6325, 6326, 5289, administration 188
Sfan: Why is just Z-position enough to cache? Does that speed up any of our lookups by position by enough to matter?
Also, why are position updates no cost?
Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...
I think the most likely next step when someone has working code is for you to test the performance in a real situation.
What would be useful to know right now however is: how many active objects do you have usually?
Sfan: Why is just Z-position enough to cache?
It may not be, but see the last section.
Also, why are position updates no cost?
After updating an object the cost is paid during every following spatial query, that's the m
component in the big-O notation.
Sfan: Why is just Z-position enough to cache?
It may not be, but see the last section.
Oh, that makes perfect sense. Okay yeah I can see a way to get my idea to fit in your world relatively painlessly.
Just take the int16(x) >> 4 | int16(y) >> 4 | int16(z) >> 4. That will put all entities into mapblock sized bins and easily fit in 64 bits.
Should still yield nice speedups for objects in radius, raycast, and whatnot, just gotta make the algorithms for those lookups. Could be as simple as running the same algorithms at 16x16x16 size first and iterate over those relevant bins instead of the whole active object list like we currently do.
Bin size should be played with, obviously. Because 1x1x1 bins wouldn't yield much value I suspect.
i tried to take a stab at a better data structure for this. the result, in lua, is here https://github.com/fluxionary/minetest-futil/blob/main/data_structures/point_search_tree.lua. the biggest issue with it is that it can't be re-balanced when new points are added or removed. i found a paper about that but never got around to implementing it https://arxiv.org/abs/1410.5420
I also have a Lua implementation of k-d-trees. I think trying to rebalance k-d-trees is a dead end. I tried for a while, it doesn't seem to really be feasible to me.
But it is possible to build an efficient dynamic data structure out of a forest of static k-d-trees (see "transforming static data structures to dynamic structures"), which is what I'm suggesting here and currently implementing. I can't tell whether this will hold up in real world usage, but implementing and testing it is the only way to find out. I'm optimistic that we can get decent constant factors because there are plenty of knobs to tweak.
Thought through sfan's idea more (worked on it). There's also a remove (when object is no longer active) that is also O(log(n) + m)
Alright, I've got a seemingly working (according to a quick randomized test against a naive implementation) dynamic forest of k-d-trees data structure whipped up, feel free to take a peek at my branch. The big O for this should be good (though not quite as good as what I advertised initially; for example deletion is amortized O(log(n)²) rather than O(log n) with this implementation), what remains to be seen is how it holds up in practice (potentially after some optimization), and integration into the active object manager, which I may get to on the weekend.
Will take a look, sfans idea was short enough that I just jumped right into server::active object mgr integration, the only not straight forward thing is how to signal object position updates. In mine I just check if last position is different from the new position after step().
I will probably not write tests for your structure directly, as MT is a much better stress test of complexity than I can imagine haha, but I can at least look for obvious things.
Not quite done with my version, but perhaps ready for integration tests against servers/mods by end of week.
Okay, first draft version working correctly? I'm certain I have missed spots where an object position changes and I need to invalidate the cache, such as object:set_pos(), but I have data to back up that I have a working solution, albeit only 10% optimized.
I am testing with a custom mod with devtest, this is the init.lua:
Here is the gif of what that test mod and integration test looks like in-game:
Then, I went ahead and plotted the results:
You'll notice that even though there are 101 Entities flying around, the actual time to getObjectsInArea 10 times is varying significantly based on where they are in relation to the getObjectsInArea's AABB.
Take a look at my implementation, I just always check the mapblocks to that max AABB extents, so it's sub-optimal where we are requesting a spherical radius, because I still just check all mapblocks from -radius to radius in all 3 axis centered around the origin point.
Still, surprised it was only about ~80 Loc for the structure. (also notice I set the mapblock size as my bin size. Perhaps smaller than 16, like 8, might be better. I wouldn't know without field testing (and it'll be game specific, actually)
Some feedback / thoughts:
DS:
You can't see here, but I've made a lot of progress on the hash map solution for us. I'm only focusing on server side right now, so selection boxes aren't a consideration. That said, everything in Minetest must have a position within 2^16 of 0,0,0 which helps tremendously, allowing me to pack everything into buckets of size 16x16x16 nodes. We can revisit later if we want to make this changeable as a setting.
As for big objects, they just have a plain old single origin position, that's all that's needed right now.
My single unordered multi map, is a single layer Octree, and with 10,000 mostly evenly spaced out entities (current benchmark), it's quite performant, haha.
I agree this is a super dead simple solution and gets us good performance compared to what we have (by far). Will post a PR hopefully by end of day.
Performance of the map can absolutely be optimized later, it's just very flexible of a solution.
Alrighty, there's my current merge^^^
Here's the performance results:
Summary for the lazy:
Depending on your data distribution, either a k-d-tree, gridfile or an octree can be a good choice. Here I would assume that a grid based on the map chunks and then an octree within the map chunks can be very elegant and efficient (because the grid corresponds to loading/unloading of chunks, and this level is somewhat "proven coarse enough for many applications"). But for simplicity, a pure octree is also worth exploring (not that "one octree per chunk" is that much more complicated). I doubt that more fancy things such as R-trees, VP-trees, etc. are beneficial here.
@kno10
Agree those are the options. I implemented a single-layer octree using std::multimap aligned at the mapblock (16x16x16) boundaries. That PR has gone through multiple reviewers and test rounds. At this point it is technically ready for merge.
However, one of the newer Core Devs, appgurueu, has been working on a dynamic k-d branch, working on it as of even last week.
We could absolutely just merge mine as is and probably get roughly the same performance, but we are waiting for appgurueu to be (a) done, (b) reviewed and (c) tested. That might take a long time.
I still support his efforts, which is why I have not made a stink about merging my implementation. It's simple, does get performance benefits, but this is a case where I'm willing to let great get in the way of good since I'm not a Core Dev.
Speaking of a k-d-tree, one reason why I came to this old thread is biomes. Biome computations are currently linear in the number of biomes, but because of limitations of the current biome API, we see games try to trick the system by registering sub-biomes and variants (in particular, performing y slicing of biomes). At some point, the number of biomes may affect the performance of the biome lookup. A k-d-tree might be an interesting solution to improve the performance; it would likely be built only once at game start, over (currently) only the heat and humidity points of the biomes (i.e., 2d). Here, a k-d-tree is likely better because humidity and heat are more like normal distributed. Maybe @appgurueu can use some of his code for that, too?
There's a closely related older issue (#6453) but I felt like a fresh start here was better.
Problem
tl;dr: optimizing anything else server-side is basically a waste until this is solved, skip to the next section.
The nice people from the Your Land server sent me several profiling traces of their production server and a common theme is that about 40% of time (on the server thread) is spent just iterating the entity list (average, not peak!). Now their mods use
get_objects_inside_radius
a bunch, but this is not a sole cause and I expect any sizeable server will be plagued with this exact issue.The issue are spatial queries on the entity list, which happen more often than one might think:
get_objects_inside_radius
,get_objects_in_area
ServerEnvironment::getAddedActiveObjects
All of these are
O(n)
with n being the number of entities. When you consider that every entity does collision detection this becomesO(n²)
right in the main thread.Solution requirements
What we need is a new data structure.
It must:
getObjectsInArea
) on the entity positionexternal circumstances:
getObjectsInsideRadius
) can be emulated on top of AABB queries and are not an important pointFinally we will also need to figure out how to reliably propagate position updates up to the container. There are many ideas in the form of "just add a cache to <thing>" that come to mind, but keeping it in sync is generally the trickier part than designing the structure.
Real-world example data
(thanks to @Bastrabun) There are 4500 entities. About 1400 move often, 3100 are static.