Closed JesseCrocker closed 7 years ago
What version of osm2pgsql were you running before the update?
osm2pgsql version 0.93.0-dev (64 bit id space)
Im not sure the exact commit, but it was built from master on may 16.
osm2pgsql version 0.93.0-dev (64 bit id space) Im not sure the exact commit, but it was built from master on may 16.
Hmm, this is after the unprojected slim coordinates.
The point is near the edge of web mercator.
Do you have the $CHANGE_FILE?
The area in question is https://www.openstreetmap.org/relation/2823696#map=12/-85.0522/-162.2245
My update script deletes the changes file after a failure, but i can modify it to not do that, do a new import, and grab the file. Should have it on monday, then i'll upload it to s3 and send a link.
Can you also reduce the number of updates you get at once? This should result in a smaller XML file to search through.
I checked the relation in the area with osm2pgsql and it loads fine. I also looked for anything changed recently, and no part of the relation has changed since well before the diff you're on. In case I had transposed coordinates, I checked that area and found no data.
Because it's probably off the edge of web mercator, I don't trust the y coordinate. The point involved is https://www.openstreetmap.org/node/2311150138.
Regardless of what the way is doing, osm2pgsql and libosmium shouldn't produce a geometry that geos rejects. What does select postgis_full_version();
on your database server report?
cc @joto
Osmium only ever uses WGS84 coordinates. If you project areas created by osmium into Mercator they might well be invalid. That's outside the scope of libosmium.
Can you also reduce the number of updates you get at once? This should result in a smaller XML file to search through.
I no longer need the diff, since I can reproduce an error locall. Although I'm not sure why you get an error with COPY and I don't, or why that geom is processed at all, if I download that relation from the API and import it with
osm2pgsql -d gis --slim r2823696.osm -S default.style
I get
gis=# select count(*) from planet_osm_polygon where not st_isvalid(way);
NOTICE: Self-intersection at or near point -18073790.800134856 -20061906.389537029
┌───────┐
│ count │
├───────┤
│ 1 │
└───────┘
This is an error, and is almost certainly the same problem that you're seeing.
If I try it with 0.92.0, the DB has no invalid geoms.
Because this is a critical regression, we're holding up the 0.94.0 release
The original error does not look like it comes from osm2pgsql itself. @JesseCrocker Do you have additional triggers installed on planet_osm_polygon
? Then you can add a validity check there (using ST_IsValid()
.
As for creating invalid geometries: we need to get rid of all points outside the defined area of the chosen projection. This is also what the geos-based version used to do as far as I can tell. The trick is to find the points before sending the lines to libosmium for the polygon creation even though at this point the coordinates are still in WSG84. If we do this, then libosmium can still guarantee that the geometry is still valid. I don't think reprojection can invalidate a geometry if all coordinates are valid in the source and target projection. @imagico, please correct me if I'm wrong.
The interesting question is what to do with the invalid points. We can either drop them completely or move them towards the nearest edge. The former is more likely to leave the geometry usable but it will lead to odd rendering artifacts near the edges. @pnorman opinions?
I just checked and relation 2823696 as assembled by osmium correctly converts from EPSG:4326 to EPSG:3857 without becoming invalid. So whatever happens here is probably the result of some other error in processing and not inherent to the coordinate conversion.
The safest and cleanest way to avoid coordinate conversion problems with web mercator would be to clip the polygons at or slightly beyond the nominal web mercator limit (85.05 degrees latitude) before converting to mercator. This way you'd get rid of any geometries that might really cause trouble (i.e. stuff near the pole). Doing this in a generic way that also works for other projections is not trivial though.
In any case you will have to do a proper clipping, just skipping nodes outside the bounding box will not work (and i assume this is the reason why you have problems here).
There can still be extreme cases of borderline self intersecting polygons that are valid in geographic coordinates and invalid in projected coordinates but this is rare for mercator and ultimately only means you cannot completely rely on the database being free of self intersections. Completely avoiding this is a hard problem and the only reasonable way to do this i could think of would be to subdivide the edges involved in the self intersection until the problem vanishes.
For the vast majority of geometries you would probably - in case of web mercator - also be safe to just convert them without clipping or skipping nodes if they cross the edge like relation 2823696 and skip the whole geometry in case it is fully outside the map area (like a building near the south pole). This will however fail for some geometries like the boundary relations of Antarctic claim sectors which all start at 60 degrees latitude and include the south pole so they cannot be converted to mercator without clipping before reprojection.
The interesting question is what to do with the invalid points. We can either drop them completely or move them towards the nearest edge. The former is more likely to leave the geometry usable but it will lead to odd rendering artifacts near the edges. @pnorman opinions?
We have to do proper clipping.
The original error does not look like it comes from osm2pgsql itself. @JesseCrocker Do you have additional triggers installed on planet_osm_polygon? Then you can add a validity check there (using ST_IsValid().
My guess is a functional index is present.
The two options I see are to require geos again and validate the geometries, or to default to adding a trigger for anything other than 4326.
I just checked and relation 2823696 as assembled by osmium correctly converts from EPSG:4326 to EPSG:3857 without becoming invalid. So whatever happens here is probably the result of some other error in processing and not inherent to the coordinate conversion.
How'd you check this, and what versions of components?
you cannot completely rely on the database being free of self intersections
All geometries being ST_IsValid is an essential constraint.
All geometries being ST_IsValid is an essential constraint.
Then we have to have postgis check the geometries. It is the only way to guarantee that the same validation method are used and rounding and precision constraints are the same.
It should be reasonably easy to find all geometries which are a) completely beyond the 85° boundary and ignore them and b) straddle the boundary. Still need something to do the clipping for these. One option would be wagyu which can do clipping and is a performant header only library that should be reasonably easy to include.
How'd you check this, and what versions of components?
Extracted the relation from overpass api, converted to spatialite file with osmium (built from master from a few weeks back i think), converted to EPSG:3857 using ogr2ogr (GDAL 2.2.1) and checked for ST_IsValid() using spatialite.
I don't think this is an issue though - the geometry contains nothing that is prone to making it invalid upon reprojection and osmium would report and reject if it was invalid originally.
All geometries being ST_IsValid is an essential constraint.
As @lonvia said then you need to check with Postgis and reject all geometries that violate this. Geometries becoming invalid by reprojection is rare with mercator but with a whole planet that is still likely to occur a few times.
Some thoughts on a trigger-less solution that still uses Postgis for validation:
INSERT INTO planet_osm_polygon SELECT * FROM (SELECT <id>,...., <way> as way) w WHERE ST_IsValid(way)
should do the trick.Given that neither of the two comes without a cost, I'm slightly in favour of making something like this configurable. Trying to repair is an option as well, just makes the query a bit more complicated.
Taking a look at this. Just to note, to trigger it on updates I'm doing a normal --create --slim
and then
psql -d gis -c 'TRUNCATE planet_osm_polygon;TRUNCATE planet_osm_nodes;TRUNCATE planet_osm_rels;TRUNCATE planet_osm_ways;' && ./osm2pgsql -d gis ../r2823696.osm -S ../default.style --slim --append && psql -X -d gis -c 'select count(*) from planet_osm_polygon where not st_isvalid(way);'
I'm looking at updates first because initial import is easy - just add a WHERE
to the CREATE TABLE ... AS SELECT ... ORDER BY ...
as @lonvia noted. Because we sort everything in the rendering tables and gazetteer is 4326, we don't need to consider unsorted tables.
After #790 a reasonable solution until proper clipping can be implemented would probably be to limit all coordinates to something like 89.9 degrees so they become re-projectable to Mercator and enter them into the database without clipping. This would probably work for the boundary relations and everything else currently in the database that crosses the web mercator edge.
I have not tested this idea - it is just inferred from my understanding of current behavior of osm2pgsql.
Fixed in #790
@JesseCrocker that PR won't actually fix it for you since it sets stuff up on import, but to do it without a reimport this SQL should work:
DELETE FROM planet_osm_polygon WHERE NOT st_isvalid(way);
CREATE OR REPLACE FUNCTION planet_osm_polygon_osm2pgsql_valid()
RETURNS TRIGGER AS $$
BEGIN
IF ST_IsValid(NEW.way) THEN
RETURN NEW;
END IF;
RETURN NULL;
EN
$$ LANGUAGE plpgsql;
CREATE TRIGGER planet_osm_polygon_osm2pgsql_valid BEFORE INSERT OR UPDATE
ON planet_osm_polygon
FOR EACH ROW EXECUTE PROCEDURE planet_osm_polygon_osm2pgsql_valid();
Edit: Fixted substitutions
Thanks for the fix. the delete statement found 4 invalid polygons.
osm_rendering=> DELETE FROM planet_osm_polygon WHERE NOT st_isvalid(way);
NOTICE: Self-intersection at or near point 19410477.129600767 -20061906.389537029
NOTICE: Self-intersection at or near point 587312.91697521065 6271092.4226212045
NOTICE: Self-intersection at or near point 587312.94425467972 6271092.4616400991
NOTICE: Self-intersection at or near point 11985875.570417937 4115611.1799857486
DELETE 4
I had to make a couple changes to the function/trigger to get it to work.
CREATE OR REPLACE FUNCTION planet_osm_polygon_osm2pgsql_valid()
RETURNS TRIGGER AS $$
BEGIN
IF ST_IsValid(NEW.way) THEN
RETURN NEW;
END IF;
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER planet_osm_polygon_osm2pgsql_valid BEFORE INSERT OR UPDATE
ON planet_osm_polygon
FOR EACH ROW EXECUTE PROCEDURE planet_osm_polygon_osm2pgsql_valid();
osm2pgsql version, built from master @ 21aeaf798c57699ac1574879e7ad9bd31b00e658:
OS: Ubuntu 16.04.3 LTS Database: Amazon RDS, PostgreSQL 9.6.2
Approximately 49 days ago my cron job pulling in OSM updates started failing because of an error running osm2pgsql. 2 days ago I updated osm2pgsql, and the error is still happening.
options im using:
osm2pgsql -a --slim -e9-14 --hstore --number-processes 4 -d $DB -U $DB_USER -H $DB_HOST --style $STYLE --cache $CACHE_SIZE -o expiry.log $CHANGE_FILE
Starting point for updates
start import from seq-nr 2560529, replag is 47 day(s) and 20 hour(s)
osmosis version: 0.44.1-4 installed from ubuntu package
Last part of output from osm2pgsql
Is there a way to force osm2pgsql to ignore the GEOS errors? Maybe this is actually an osmosis issue?