Closed ainemitch closed 4 years ago
Hi @ainemitch, thanks for the PR.
From what I can see this change would allow the schema to use a different variable name for the shape
field.
The alternative name for the field you've provided is the name polygon
.
I don't see any functional difference between the two, Is that correct, it's for the purpose of changing the name of the field?
Could you please explain a little more about the motivation for the work?
You mentioned that you'd like to import WhosOnFirst data.
I'm not 💯 clear on why changing the name of the field would make any difference as in both cases they represent a geo-shape
type?
I just had a look at pelias/model
and the field is incorrectly named polygon
there.
Is that the reason? If so we should change pelias/model
because it's not correct, the correct name is shape
.
The term 'shape' is purposely vague because it can be a variety of geometry types such as point, linestring, polygon and the multi* variants.
As you might have guessed this field is almost never used, which is why no-one else has noticed in 3 years 😆
Hi Peter @missinglink Thanks for getting back to me.
We are building a point-in-polygon reverse geocoding application using Pelias schema and the whosonfirst modules.
We started by attempting to import the whosonfirst data into Elasticsearch (from whosonfirst-data-admin-us-latest.db) using the schemas as defined with the geo_shape type set to shape. This did not result in the import of the polygon point data into ES however so it seems like the polygon data is not uploaded by default. After we made the update to change the geo_shape type to polygon (we have done the same in the whosonfirst module and will be creating a pull request for that also) everything worked fine and we were able to successfully execute queries.
For example executing the below request against ES with the data loaded with the geo_shape type set to shape resulted in:
curl --location --request GET 'localhost:9200/pelias_us/_search' --header 'Content-Type: application/json' --data-raw '{ "query":{ "bool": { "must": { "match_all": {} }, "filter": { "geo_shape": { "polygon": { "shape": { "type": "point", "coordinates" : [-97.755469, 30.306845] }, "relation": "contains" } } } } } }'
{"error":{"root_cause":[{"type":"query_shard_exception","reason":"failed to find geo_shape or geo_point field [polygon]","index_uuid":"7IRJvMqHTL6YJ7DJGr1GQA","index":"pelias_us_1.0_aine_test"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"pelias_us_1.0_aine_test","node":"IrY1nK6_TySiND2r-xtfKQ","reason":{"type":"query_shard_exception","reason":"failed to find geo_shape or geo_point field [polygon]","index_uuid":"7IRJvMqHTL6YJ7DJGr1GQA","index":"pelias_us_1.0_aine_test"}}]},"status":400}
We got a successful response back with geo-shape type set to polygon:
{"took":7,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":5,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"pelias_us_1.0_aine11","_type":"_doc","_id":"whosonfirst:county:102081935","_score":1.0,"_source":{"center_point":{"lon":-97.708968,"lat":30.261556},"parent":{"country":["United States"],"country_id":["85633793"],"country_a":["USA"],"county":["Travis County"],"county_id":["102081935"],"county_a":[null],"region":["Texas"],"region_id":["85688753"],"region_a":["TX"]},"bounding_box":"{\"min_lat\":30.024499,\"max_lat\":30.628249,\"min_lon\":-98.172977,\"max_lon\":-97.369539}","name":{"default":["Travis County","Travis"],"ar":["مقاطعة ترافيز، تكساس","مقاطعة ترافيز"],"bg":"Травис","el":"Κομητεία Τράβις","eo":"Kantono Travis","eu":"Travis konderria","fa":["شهرستان تراویس، تگزاس","شهرستان تراویس"],"fr":"comté de Travis","ga":"Contae Travis","hu":"Travis megye","hy":"Թրևիս շրջան","it":"contea di Travis","ja":["トラヴィス","トラヴィス郡"],"ka":"ტრევისის ოლქი","la":"Travis Comitatus","pl":"Hrabstwo Travis","ro":"Comitatul Travis","ru":["Трэвис","Тревис"],"sc":"contea de Travis","sr":"Округ etc...
This is the Pelias schema we used:
{ "logger": { "level": "debug", "timestamp": false, "filename": "./aineerrors.log" }, "logger2": { "level": "debug", "timestamp": false }, "esclient": { "apiVersion": "7.7", "hosts": [ { "host": "localhost" } ] }, "elasticsearch": { "settings": { "index": { "refresh_interval": "10s", "number_of_replicas": "0", "number_of_shards": "1" } } }, "schema":{ "indexName": "pelias_us" }, "imports": { "adminLookup": { "enabled": true }, "whosonfirst": { "datapath": "~/data/us/whosonfirst/", "countryCode": "US", "importPostalcodes": true, "importPlace": [ "85633793" ] } } }
We though that if others wanted to implement a similar system it would be handy to have the ability to make this property configurable. If you know of an alternative way of importing the polygon points into ES however please let me know. Thanks again.
If you know of an alternative way of importing the polygon points into ES however please let me know
I'd be interested in seeing what changes you made to the whosonfirst module, I'm guessing that you're calling setPolygon()
I think maybe the confusion is due to https://github.com/pelias/model/pull/134, I believe the error is in pelias/model
and not in pelias/schema
and should be fixed there instead.
As an aside, I've spent a lot of time over the last 6+ years thinking about the problem of Point-in-polygon and how to make it fast and accurate.
We have a need for point-in-polygon to be fast because when building a Pelias index, the coordinates of each address, for instance, are sent to the Point in Polygon service in order to determine the city/country/state etc.
These calculations are done >500 million times for a global index, so a bit of back-of-the-napkin math shows how important the latency requirements are:
500 000 000 * 1 millisecond = 5.78703704 days
In the past we've experimented with several methods, including indexing Polygons using the quadtree
based indexing method available in elasticsearch, but found that the performance was extremely slow compared with other methods.
Have a read of https://github.com/pelias/polygon-lookup/issues/46#issuecomment-686332484 for some more detailed info about what we've found effective.
We are currently using PIP Service (which is just a thin wrapper over WOF Admin Lookup) but we're in the process of migrating to a new solution Pelias Spatial.
Hope that gives some background on PIP in Pelias and why we've never really used the shape
field.
Hi Peter,
Thanks for all the information on that it is very helpful. That was it exactly we are calling setPolygon in the whosonfirst module, the below is the patch we applied to make the necessary changes.
--- src/peliasDocGenerators.js (revision c54ab228cc3ebe92f000f0bea861789995c74ffd) +++ src/peliasDocGenerators.js (date 1591631434000) @@ -71,8 +71,13 @@
// method that extracts the logic for Document creation. hierarchy
is optional
function setupDocument(record, hierarchy) {
logger.debug( Loading '${record.place_type}' with id ${record.id} and name '${record.name}'
);
var wofDoc = new Document( 'whosonfirst', record.place_type, record.id );
if (record.shape) { wofDoc.setPolygon(record.shape); }
if (record.name) { wofDoc.setName('default', record.name);
Should I raise a PR to make that update in the model project so?
Thanks again for all your feedback.
Hi @ainemitch, As mentioned, this PR shouldn't be necessary once we merge https://github.com/pelias/model/pull/134, which will happen any time now.
Thanks for the work you've done, and we'd definitely be happy to review a PR to the whosonfirst importer to see how importing the geometry data looks these days. We've had some issues with Elasticsearch geo functionality in the past: it can be very slow and may not handle large geometries, like the one for New Zealand (over 100MB!), but I think we last tested it in ES5 or 6, so we might as well give it another go.
Thank you for that and the update. We have been using Elasticsearch 7 in our application and have found the performance to be acceptable for our needs thus far when loading the entire world from data. The following post has some information on the improvement in geo_shape indexing in ES7 that may be of interest to you. https://www.elastic.co/blog/bkd-backed-geo-shapes-in-elasticsearch-precision-efficiency-speed.
@ainemitch that's great to hear. It looks like we really do need to evaluate indexing shapes again with ES7. If you have a chance, can you open a PR in pelias/whosonfirst with your changes there? :) Thanks!
Yes I will open the PR once we have complete our testing, it should be done in a few more days. Thanks again for all your feedback.
Hey @ainemitch one final thing worth mentioning is that we used to suffer many lost documents at index time with older versions of elasticsearch when indexing shapes.
What was happening was that the WOF importer was sending each geojson shape to elasticsearch, but elasticsearch was taking longer to process them than it could handle, so requests would get queued up waiting for CPI and eventually, once the queue fills up, it would start rejecting documents.
You'd be able to detect this by getting an expected doc count from WOF and an actual doc count from elastic, and comparing them. The importer probably also tracks errors in the output.
:wave: I did some awesome work for the Pelias project and would love for everyone to have a look at it and provide feedback.
Here's the reason for this change :rocket:
Here's what actually got changed :clap:
Here's how others can test the changes :eyes:
Unit and integration tests have been updated accordingly.