opentraffic / architecture

OTv1 overview
71 stars 11 forks source link

Storage architecture for traffic data (the pool) #4

Open kpwebb opened 9 years ago

kpwebb commented 9 years ago

Need to store temporal data for speeds, attached to OSM ways/segments. Ideally data would be stored (or at minimum viewed) as a histogram of speed observations for a given window of time.

In the past we've used an in-memory key/value store with a list of values per segment & time window. As we scale up the geographic area we'll need to make sure the storage architecture can accommodate a global data set.

Currently considering a tile-based store that's bounded by time (/data/x/y/z/time-range.datafile). Files can be archived to an object store (e.g. S3). But question remains about which file format is best? PBF or something more application specific?

Are there formats that are well suited for querying (e.g. SQLite?) or should we use simple data storage and move the computation into the downstream application?

Compactness, strong client-lib support, and read/write/seek latency seem to be key considerations.

laurentg commented 9 years ago

Do we have an estimate of the amount of data to store? Why not using a standard database such as PostGreSQL? Data could be archived and stripped after several months (or more/less depending on the amount of data per "segment").

kpwebb commented 9 years ago

@laurentg Open to that approach but I think we're going to outstrip normal RDBMS approaches. At global scale we're talking about 100s of millions of segments and quickly billions of observations. Postgres starts to require a lot of thought at that scale and ultimately becomes a choke point.

An alternative approach is to bin the data into geographic regions. I like the web mercator tile approach that's already being used for many vector map data stores (e.g. https://github.com/mapzen/vector-datasource/wiki/Mapzen-Vector-Tile-Service). We can keep current observations close to memory and shuffle archival data into object stores like S3 for persistent storage/sharing.

We've also switched almost entirely to memory mapped, off-heap key-value stores for data storage at Conveyal and really like the speed and flexibility this offers. You and @abyrd have probably talked about the MapDB advantages in OTP/OSM data store. I think something similar could work well here.

laurentg commented 9 years ago

Just for the record, here are some statistics we gathered on some OTP optimization experiment, based on OSM data:

                        Netherlands  Paris (IDF)     NY state
-----------------------------------------------------------------------
Number of street edges      4494910      1341265      2149524
Number of geometries        4495406      1341581      2149554
Avg points per geometry        3.43         3.83         5.66
Total edge length (km)       444528       146783       494817
Avg length (m)                98.89       109.41       230.20
Number of street nodes      1632641       492862       830947
-----------------------------------------------------------------------

Obviously here we are counting every edges, including walk-only (but bike would also be the scope of this project). It would be interesting to get more precise statistics for bike/car edges only for the whole world.

Note: Those stats can be generated by the OTP "statistics graph builder", it also print the edge count by length histogram (log and linear scale), not present here.

abyrd commented 9 years ago

@laurentg it would be straightforward to modify Vanilla Extract or the new OTP OSM loader to output the number of inter-intersection edges in a whole planet.pbf.

Based on recent experience I have as Kevin said become a fan of in-process database engines. Often all the mapping back and forth between objects and relations, all the formulations of simple table lookups in SQL, conversions of data types etc. we do when working with an RDBMS are completely superfluous, and keeping separate server processes up and cooperating introduces a lot of deployment/maintenance overhead. The idea is of course not to write our own database code, but to call library-supplied storage backend code directly, rather than go through the whole mess of talking to a separate server process using strings of text.

However, this project will likely require heavy concurrent access to the datastore, something we'll need to be careful about. It's still possible that for one reason or another we'll discover that a traditional RDMBS is a good idea, but for now we should certainly explore the alternatives -- this project may gain a lot from a somewhat unconventional approach.

@kpwebb I am all for tiling as an effective and dead simple "spatial indexing" technique (it's what we're doing in Vex). However I'm not sure there's any need to use the filesystem to materialize those tiles internally, it may just be a view exposed to end users. Whatever storage backend we use should have no problem handling x,y map tile tuples as keys in an index table. We should definitely discuss the advantages and disadvantages of tiles-as-files, as concerns shuffling old data out to archives or building summary stats over time windows.

Just off the top of my head, we may very quickly exceed the typical limits on open file descriptors (1024), and having to manage a sort of LRU cache of open files is just ugly. Especially since we may reasonably expect people to be pushing/fetching data to all tiles worldwide simultaneously, it might scale poorly. Seems like the storage layer should mostly abstract away the underlying files seen by the OS.

laurentg commented 9 years ago

On many systems an opened file take a non negligeable amount of memory (around 1K for linux), and usually non swappable. So leaving around many opened files can be an issue: many tiles would contains very few data so a 1K overhead for a small tile could be rather expensive.

kpwebb commented 9 years ago

@laurentg and @abyrd point taken re file overhead. Let's see what throughput we can get off of a single end-point for collecting speed updates. Could be that one end-point is fast enough or we can shard the data some how across a couple machines if needed.

Separately from the real-time data collection I like the tile-based approach for archival storage. This let's us bin archival data into defined geographic regions and stick in an object store like S3 so other users can retrieve only what they need. The same reasons this works well for vector tile data could work well here.