Closed ncolomer closed 11 years ago
@ncolomer Looks great do you need some testing ?
Hi @apavillet! I was about to tell you about this :)
This branch is now stable, can you launch tests/benchs and tell me how it goes? I found that query time is really impressive when using the shape filter (<100ms) over a 16-million-doc index (ile-de-france ways and nodes). In addition, you can use geo filer/sort with the new centroid field on both indices!
I am currently working on the documentation and will follow with the implementation of relations (that is a bit more complexe regarding how such type is formatted in OSM data).
Hello @ncolomer, I did the test last night on that same file everything works great for the creation of the index. Only problem I has was a "too many open files" error on Mac os x but this can be solved quickly (http://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1).
I have not tried the queries yet but i'll let you know. From what I see relations are not in yet right ? How do you plan to implement that (for say, a housenumber node which is part of the road ) ?
Thanks again great work
Edit : after testing this is really fast (except for the first query which took 14seconds). This is really good :)
Oh yes the cretions of the way type seems to be a lot longer I guess it's normal since you're calculating stuff like centroid ?
Oh yes the cretions of the way type seems to be a lot longer I guess it's normal since you're calculating stuff like centroid ?
Computing centroid or even perimeter (the lenght
field) does not really consume CPU. Your injector should remain peathful in term of CPU load (mine does with 2 CPUs and default plugin configuration).
The extra-time you observe comes from 2 things:
To give you some figures, I just took ~1h28m to index the whole ile-de-france dataset (node + ways = ~22M docs) on a 6-CPU 8GB-RAM VM. Final index size on disk is ~4.9GB. You can filter entities using Osmosis to reduce index size and building time. I also aim (later) to process OSM diff files to keep an index up to date.
That's about the time it took for me too. Concerning the diff files it's a nice to have feature but I think once the index has been build it's fine to only update it like once a week and this can be done easily using another index and then aliases to switch between them. Even for files like france.osm that should take about a day which is fine
From what I see relations are not in yet right ? How do you plan to implement that (for say, a housenumber node which is part of the road ) ?
Right! Implementation of relations will follow after I have released this branch: this feature is more complexe since there are many relation types and thus, a lot of specific case to take into account. Moreover, a relation can be composed of both nodes and ways. I think to focus first on multipolygon, route and boundary relation types.
Concerning the diff files it's a nice to have feature but I think once the index has been build it's fine to only update it like once a week and this can be done easily using another index and then aliases to switch between them. Even for files like france.osm that should take about a day which is fine
I agree, but don't worry, this feature is deeper in the backlog ;)
FYI, I will present this plugin at the next elasticsearch France Meetup #2, it would be a good opportunity to meet and discuss about all this stuff :) Ho and, will you be at the Datapero #7 the 23/4? A friend of Margaux told me that she is organizing the event!
Oh great, I'll be there of course since Home'n'go ( the company margaux and I cofounded ) created this event, it'll be nice seeing you ! As for the ES meetup I don't if i'll be there yet but if I am i'll be sure to come as well.
Having toyed a little with OSM relationships I know this is much more complex and not quite easy to debug :) I came around this tool yesterday that might help vizualize things better (http://openstreetmap.us/iD/release)
Concerning the test i'm in the process of indexing a larger file (france.osm) to see what the performances are since last time there was a threshold at which queries started to get much slower.
Changelog:
geo_point
andgeo_shape
types using entities' centroid and shape fields