Closed sevko closed 8 years ago
There are a variety to data errors in the quattro data, running a full import you will see a bunch of different errors being emitted, including some geometries which end in [NaN]
, [undefined,undefined]
or something similarly cryptic.
Do you think these issues should be addressed here or in https://github.com/pelias/quattroshapes-patch ?
Would need to investigate what's causing the problem. I'll do that depending on how we resolve #15.
After some digging, it appears that elasticsearch complains whenever a geometry has a repeating point inside of it. Here's a simple example:
$ curl -XPUT http://localhost:9200/pelias/neighborhood/1\?pretty -d '{
"boundaries": {
"type": "Polygon",
"coordinates": [[[0, 0], [2, 0], [2, 1], [3, 1], [2, 0], [5, 0], [3, 3], [0, 0]]]
}
}'
{
"error" : "MapperParsingException[failed to parse [boundaries]]; nested: InvalidShapeException[Ring Self-intersection at or near point (2.0, 0.0, NaN)]; ",
"status" : 400
}
Note the repeating [2, 0]
inside that geometry. Here's what that shape looks like:
Without that point, it gets indexed successfully, and looks like:
Turns out that a geometry with repeated points is badly-formed. PostGIS, however, provides an ST_MakeValid()
function that allegedly fixes invalid geometries, and it seems to work for a simple case like (this is the same geometry that I used in the above comment):
> select st_astext(st_makevalid(ST_GeomFromText('POLYGON((0 0, 2 0, 2 1, 3 1, 2 0, 5 0, 3 3, 0 0))')));
+--------------------------------------------------+
| st_astext |
+--------------------------------------------------+
| POLYGON((2 0,0 0,3 3,5 0,2 0),(2 0,3 1,2 1,2 0)) |
+--------------------------------------------------+
Including that in pelias/quattroshapes-patch might resolve these problems.
Here's a count of invalid geometries per Quattro layer:
layer | # invalid |
---|---|
admin0 | 3 |
admin1 | 38 |
admin2 | 66 |
localadmin | 122 |
localities | 1360 |
neighborhoods | 183 |
After upgrading to elasticsearch 1.5.1, which should contain a bunch of geo-fixes, I began running into:
MapperParsingException[failed to parse [boundaries]]; nested: ElasticsearchParseException[Invalid shape: Hole is not within polygon];
Turns out that it's thrown for polygons with a hole whose first point (and only the first point) lies on the outer ring, like in:
{
"boundaries": {
"type": "Polygon",
"coordinates": [
[[2, 2], [2, 0], [6, 0], [6, 6], [0, 6], [0, 2], [2, 2]],
[[2, 2], [4, 2], [4, 4], [2, 4], [2, 2]]
]
}
}
A fix has been landed in elasticsearch, so we should wait until it makes its way into the intermediate 1.5.x bugfix release.
Seems like this has to do with 4Shapes not doing any postprocessing of geometries. We will have to address in our own polygons.
Closing since we now use Who's on First for geometries, and we haven't seen any issues like this recently
Polygons are failing to import with traces like the following:
Here's one of them:
@hkrishna reports he's seen this issue with multipolygons with holes.