ncolomer / elasticsearch-osmosis-plugin

An Osmosis plugin that index OpenStreetMap data into elasticsearch
Apache License 2.0
101 stars 34 forks source link

Major plugin improvement following the new 0.90.x version #9

Closed ncolomer closed 11 years ago

ncolomer commented 11 years ago

Changelog:

apavillet commented 11 years ago

@ncolomer Looks great do you need some testing ?

ncolomer commented 11 years ago

Hi @apavillet! I was about to tell you about this :)

This branch is now stable, can you launch tests/benchs and tell me how it goes? I found that query time is really impressive when using the shape filter (<100ms) over a 16-million-doc index (ile-de-france ways and nodes). In addition, you can use geo filer/sort with the new centroid field on both indices!

I am currently working on the documentation and will follow with the implementation of relations (that is a bit more complexe regarding how such type is formatted in OSM data).

apavillet commented 11 years ago

Hello @ncolomer, I did the test last night on that same file everything works great for the creation of the index. Only problem I has was a "too many open files" error on Mac os x but this can be solved quickly (http://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1).

I have not tried the queries yet but i'll let you know. From what I see relations are not in yet right ? How do you plan to implement that (for say, a housenumber node which is part of the road ) ?

Thanks again great work

Edit : after testing this is really fast (except for the first query which took 14seconds). This is really good :)

apavillet commented 11 years ago

Oh yes the cretions of the way type seems to be a lot longer I guess it's normal since you're calculating stuff like centroid ?

ncolomer commented 11 years ago

Oh yes the cretions of the way type seems to be a lot longer I guess it's normal since you're calculating stuff like centroid ?

Computing centroid or even perimeter (the lenght field) does not really consume CPU. Your injector should remain peathful in term of CPU load (mine does with 2 CPUs and default plugin configuration).

The extra-time you observe comes from 2 things:

To give you some figures, I just took ~1h28m to index the whole ile-de-france dataset (node + ways = ~22M docs) on a 6-CPU 8GB-RAM VM. Final index size on disk is ~4.9GB. You can filter entities using Osmosis to reduce index size and building time. I also aim (later) to process OSM diff files to keep an index up to date.

apavillet commented 11 years ago

That's about the time it took for me too. Concerning the diff files it's a nice to have feature but I think once the index has been build it's fine to only update it like once a week and this can be done easily using another index and then aliases to switch between them. Even for files like france.osm that should take about a day which is fine

ncolomer commented 11 years ago

From what I see relations are not in yet right ? How do you plan to implement that (for say, a housenumber node which is part of the road ) ?

Right! Implementation of relations will follow after I have released this branch: this feature is more complexe since there are many relation types and thus, a lot of specific case to take into account. Moreover, a relation can be composed of both nodes and ways. I think to focus first on multipolygon, route and boundary relation types.

Concerning the diff files it's a nice to have feature but I think once the index has been build it's fine to only update it like once a week and this can be done easily using another index and then aliases to switch between them. Even for files like france.osm that should take about a day which is fine

I agree, but don't worry, this feature is deeper in the backlog ;)

FYI, I will present this plugin at the next elasticsearch France Meetup #2, it would be a good opportunity to meet and discuss about all this stuff :) Ho and, will you be at the Datapero #7 the 23/4? A friend of Margaux told me that she is organizing the event!

apavillet commented 11 years ago

Oh great, I'll be there of course since Home'n'go ( the company margaux and I cofounded ) created this event, it'll be nice seeing you ! As for the ES meetup I don't if i'll be there yet but if I am i'll be sure to come as well.

Having toyed a little with OSM relationships I know this is much more complex and not quite easy to debug :) I came around this tool yesterday that might help vizualize things better (http://openstreetmap.us/iD/release)

Concerning the test i'm in the process of indexing a larger file (france.osm) to see what the performances are since last time there was a threshold at which queries started to get much slower.