littermap / littermap-aws-backend

Cloud-native backend for the Litter Map application -- (Join us on: https://discord.gg/JvEQMSQaYr)
https://littermap.com
GNU Affero General Public License v3.0
3 stars 2 forks source link

Implement a robust and scalable architecture for storing and retrieving locations #5

Open specious opened 2 years ago

specious commented 2 years ago

At the current stage, the client requests locations within a radius of a point, and the database (scanning a subset roughly limited to the area being searched) returns a collection comprised of all points within that region.

To a degree, the storage and retrieval system is already optimized for retrieval by spatial proximity, since each location record is stored with a geohash (by using the PostGIS point geometry type when storing the location) alongside its latitude and longitude coordinates, which essentially encodes its proximity to other points.

However, since there is no limit as to the total number of locations that can be marked on the globe, this strategy will need to be replaced with something robust enough to perform well at arbitrary scale.

A proper system for object retrieval will be able to:

Since requests will be made at multiple levels of detail (often including the whole world), the cluster aggregation information needs to be prepared as the data is stored, so it is ready to be served. In other words, no query should have to go through the entire data set.

The solution I propose is to use the H3 geospatial indexing system, whereby the world is implicitly partitioned into hexagonal cells at each level of detail that corresponds to queries at different zoom levels. Essentially this extends the geohashing to indexing the aggregate clusters at the time that locations are added.

This means that making a query at any particular zoom level, it will not be possible to immediately distinguish between points within the same hexagon at that resolution. However, it would be useful to, for example, be able to pick a "representative" point in the cluster, so that it could be advertised as one of the locations within that hexagon.

specious commented 2 years ago

@JasonSanford, @danieltoben introduced you to me as having specific experience working with systems that store geographic data. If you so happen to be inclined, I would absolutely appreciate your opinion on this.