hiposfer / kamal

A routing engine service using Open Street Maps and GTFS as data source
GNU Lesser General Public License v3.0
11 stars 1 forks source link

create a preprocessing script #52

Closed carocad closed 7 years ago

carocad commented 7 years ago

Currently the server needs to download, parse and process the OSM file on boot.

The problem with this approach is that the parsing and processing steps are repeated every time that a server boots even if the result of that parsing is the same.

A better approach would be to preprocess the files once and then use that file as base.

An example approach is here: https://github.com/Project-OSRM/osrm-backend#using-docker

carocad commented 7 years ago

here are the results of some experiments for outputting the network to a file to avoid preprocessing it every time:

write time: ~1.2 seconds read time (until last element): ~1.2 seconds

here is the code used in dev.clj

(time (cheshire/generate-stream @(:network (:grid system))
        (clojure.java.io/writer "resources/saarland.json")))

(time (last (cheshire/parse-stream
              (clojure.java.io/reader "resources/saarland.json"))))
carocad commented 7 years ago

On a similar matter: Java has native support for Gzip and Zip files. Node js on the other hand requires a library for processing them.

My point here being that it would be nice to have one that works on both environments such that we could create in one environment (like JS lambda function) and then read it in another (JVM).

The downside is that the files are a bit larger

Furthermore this would reduce the dependencies of the project, which is never bad ;)

carocad commented 7 years ago

@mehdisadeghi could you take care of this as well :) ?

After several experiments and some research I came to the conclusion that the best way to remain inter-operable, re-use most of our current code and still gain performance is to keep the file in the OSM format.

Problem description:

The idea behind this issue is to tackle all of those problems simultaneously with a simple script (Python?)

I tried a small sketch of this here but it proved quite difficult in Clojure since it requires an in-place mutations which are troublesome. I used a setup variable to configure which attributes and which elements should stay in the file. It doesnt need to be like that initially. I was simply trying to make it flexible :)

mehdisadeghi commented 7 years ago

@carocad I have to give it some thinking. Of course we can use Python to do this, but I am wonder whether we have to try to come up with a more abstract high level design to handle preprocessing and feeding data into the routing application. I'll start with the OSM2GTFS and will update with an initial design for this one too.

carocad commented 7 years ago

@mehdisadeghi here are my 2 cents to this discussion :)

After trying several preprocessing options like json,smile and edn I realized that I was reinventing the wheel.

Certainly it is possible to come up with a design that it is more tailored to our specific needs and that is way smaller than the original one. However, in order to do that it would be necessary to have an specification for the shape of the file, which fields are required and which are optional, what to do in case the field is not present, etc.

I studied a bit the approach from Graphhopper, OSRM and TripPlanner and most of them use their own representation for OSM files, which although valid, leads them to create their own set of tools for tackling the problems that arise whenever a custom format is created.

I would prefer if we avoid duplicating the work of others and also try to maximise the re-usability of our data (for example, if someone wants to plot in a map the input file or perform analytics on it). On that topic I actually found this tool, which I think solves all of our problems :)

boring but efficient solution :D

Let me know your thoughts once you finish the GTFS convertion and are more used to the OSM format.