tidwall / tg

Geometry library for C - Fast point-in-polygon
MIT License
570 stars 15 forks source link

Natural structure documentation #1

Closed radevgit closed 9 months ago

radevgit commented 1 year ago

Hi, I will like to know more about Natural structure implementation. Is there some more documentation or perhaps a paper on it? It seems difficult to locate where exactly is the actual implementation in this 13k .c file.

tidwall commented 1 year ago

Is there some more documentation or perhaps a paper on it?

It’s a new structure I came up with as described in docs/POLYGON_INDEXING.md.

That’s as close to a ”paper” as I got for now. I have a bunch of notes, perhaps I will do a more detailed write up in the future.

radevgit commented 1 year ago

Thanks, I already looked at provided document. What I see is, description of how useful is it (no doubt), but not actual algorithm (structure) description. Is it possible, at least to tell the function name that implements it.

tidwall commented 1 year ago

Most of the juicy details. https://github.com/tidwall/tg/blob/v0.1.0/tg.c#L1379-L1737

Look for series_new as a starting point.

kylebarron commented 1 year ago

I'm excited to read through the code! At a glance, the natural structure seems somewhat similar to the algorithm used in flatbush, where "a contiguous multi-level format", "only the branch rectangles are stored in memory", and "Construction of the index is quick and easy because the memory size of the index and the number of rectangles per level can be calculated before processing the segments" all apply to the flatbush algorithm as well.

I suppose the main difference with flatbush is that "natural" just doesn't do any sorting, which makes sense because you can't reorder the line segments of a polygon 😅. Then you can save memory by avoiding the storage of the "indices" that flatbush does.

tidwall commented 1 year ago

I suppose the main difference with flatbush is that "natural" just doesn't do any sorting,

Yeah, I think flatbush needs to retain the indices because it's a more generalized data structure. It's like a fast read-only R-tree. The user gives it a bunch or points and rectangles and flatbush sorts them in hilbert order and provides you with an optimized collection. Since the user has no idea of internal order of those indexes, they need to be retained at the leaf level.

When I discovered that keeping a polygon segments in their natural order did not lead to slower query times, I was like "holy s*it this is awesome". 🤯 I was able to save oodles of memory and construction times fell off a cliff.

There is of course lots of additional optimizations that I made, godbolt was my friend for many months, but for the most part avoiding sorting and storing leaf data was the biggest deal.