Closed radevgit closed 9 months ago
Is there some more documentation or perhaps a paper on it?
It’s a new structure I came up with as described in docs/POLYGON_INDEXING.md.
That’s as close to a ”paper” as I got for now. I have a bunch of notes, perhaps I will do a more detailed write up in the future.
Thanks, I already looked at provided document. What I see is, description of how useful is it (no doubt), but not actual algorithm (structure) description. Is it possible, at least to tell the function name that implements it.
Most of the juicy details. https://github.com/tidwall/tg/blob/v0.1.0/tg.c#L1379-L1737
Look for series_new
as a starting point.
I'm excited to read through the code! At a glance, the natural structure seems somewhat similar to the algorithm used in flatbush, where "a contiguous multi-level format", "only the branch rectangles are stored in memory", and "Construction of the index is quick and easy because the memory size of the index and the number of rectangles per level can be calculated before processing the segments" all apply to the flatbush algorithm as well.
I suppose the main difference with flatbush is that "natural" just doesn't do any sorting, which makes sense because you can't reorder the line segments of a polygon 😅. Then you can save memory by avoiding the storage of the "indices" that flatbush does.
I suppose the main difference with flatbush is that "natural" just doesn't do any sorting,
Yeah, I think flatbush needs to retain the indices because it's a more generalized data structure. It's like a fast read-only R-tree. The user gives it a bunch or points and rectangles and flatbush sorts them in hilbert order and provides you with an optimized collection. Since the user has no idea of internal order of those indexes, they need to be retained at the leaf level.
When I discovered that keeping a polygon segments in their natural order did not lead to slower query times, I was like "holy s*it this is awesome". 🤯 I was able to save oodles of memory and construction times fell off a cliff.
There is of course lots of additional optimizations that I made, godbolt was my friend for many months, but for the most part avoiding sorting and storing leaf data was the biggest deal.
Hi, I will like to know more about Natural structure implementation. Is there some more documentation or perhaps a paper on it? It seems difficult to locate where exactly is the actual implementation in this 13k .c file.