(p)prepair fails on it`s own output polygons (.shp)

lucasmation commented 8 years ago

As I reported in another issue, the output of pprepair seems to contain some invalid polygons and some minor overlaps (at least that is what we detect when imported into Postgis). So I tried running pprepair on its own output. And also prepair > pprepair on the output of the original prepair. It does not work (see bellow).

-- 0) Starting at a folder with the shapefiles repaired by pprepair-new -- 1) Run pprepair-new on the original repaired files (item 0)

mkdir pprepair
for f in *.shp;
do pprepair-new -i $f  -o ./pprepair/ -r fix;   
done
-- this returns several kinds of errors, and does nor run for any of the shapefiles. Errors are:
Invalid polygon: duplicate vertices. --> Polygon #260  ** (most commeon error)**
Warning 1: Self-intersection at or near point ...
Warning 1: Ring Self-intersection at or near point ...
Warning 1: Too few points in geometry component at or near point ...
ERROR: Some polygons are (individually) invalid. (our other project 'prepair' can perform automatic repair of single polygons)
Aborted.
...

-- 2) Following advice on ther error message in item 1, run prepair on the original pprepair files (item 0)

mkdir prepair
for f in *.shp;
do prepair-improvements-with-ogr --ogr $f  --shpOut ./prepair/pr_$f;
done
-- This runs for all files, although giving these errors:
Creating ./prepair/pr_pr_11SEE250GC_SIR.r.shp...
You are using an exact number types using a Constrained_triangulation_plus_2 class would avoid cascading intersection computation and be much more efficient

-- 3) Finaly I run pprepair on the files generated in step 2).

cd ./prepair
mkdir pprepair

for f in *.shp;
do pprepair-new -i $f  -o ./pprepair/ -r fix;  
done
-- This returns the following error message for every file
Reading input dataset: pr_pr_11SEE250GC_SIR.r.shp
    Reading layer #1 (2453 polygons)
    Validating individually every polygon...
Warning 1: Self-intersection at or near point -60.813078443236904 -12.819574178491102
--> Polygon #1005

ERROR: Some polygons are (individually) invalid. (our other project 'prepair' can perform automatic repair of single polygons)
Aborted.

-- 4) Since the none of the above worked, we tried importing the the original pprepair files (item 0) into Postgis. There we: made polygons valid (ST_MakeValid), selected only the parts that were actually polygons (as opposed to lines and points), and regrouped the areas that used to belong to a multipolygon and had been slipt into single polygons by pprepair. We finally exported the files back to shapefiles and run setps 1), 2) and 3) on these exported files. We got the same errors as above

PS: the good news is that we finally managed to run pprepair-new in Linux. Due to our limited git-knowledge we were compiling pprepair-master instead of pprepair-new...

hugoledoux commented 8 years ago

These are caused by the fact that very close vertices in memory (still separate vertices) are merged together, or moved a bit, when we go from our internal representation (which has as many decimals as required) to one that is limited by the number of bits for the floating-point representation in the shapefile.

We are aware of these issues, and these are in almost all causes the cause of problems with pprepair.

The solution is complex, very complex. We plan in the future to work on techniques to ensure that for instance the distance between any pair of vertices is at least e. The snap-rounding available in CGAL an be used for this, but unfortunately it works only for very small datasets, and is thus not really usable in practice for GIS datasets.

http://doc.cgal.org/latest/Snap_rounding_2/index.html

It's explained in Section 5 of our paper: https://3d.bk.tudelft.nl/hledoux/pdfs/14_cgeo_prepair.pdf

So this is not really a "bug" that can be fixed, sorry. It's more a fact that one has to be aware of when dealing with datasets. I am not aware of any repair method that would guarantee that all vertices are e unit apart.

lucasmation commented 8 years ago

Tks Hugo. I got it. Is this limitation spefic to shapefiles or any format? Could Postgis handdle this precision? Is there any way to insert pprepair output into Postgis that does not suffer from this problem?

If not, this is disapointing. We started using pprepair because we could not run spatial operations (ST_Union, ST_Interects) duet to topological problems. If the rounding down of precision in the pprepair output causes this problems to reoccur (even if at a lower order of magnitute), then my spatial operations still won't run.

In any case, good luck solving this difficult part.

hugoledoux commented 8 years ago

It'll be the same with all formats yes, including postgis.

It's not pprepair, any other repair methods would suffer from the same problems, unless they take care of removing vertices that are too close to each other's.

What you can try is first snap all your polygons to a grid of say 1mX1m and then run pprepair. I have had good results with that---but again it cannot guarantee 100% that an alternative representation (eg in SHP) will be perfect in all cases. But in most cases it'll greatly help.

Hope that helps, Hugo

lucasmation commented 8 years ago

@hugoledoux,

I just read the paper an learnt a lot. I have 3 further comments on this:

1) Let me see if I got it right: a floating point double can have represent ~15 digits. Discounting the 2 of the "hole part" of the decimal degrees, we still have 13 decimal places in a double variable. That seems a lot of precision, nanometer scale at equator (according to this tread). It seems very unlikely that there would points so close at this scale on the original file.

2) The constrained triangulation does not move the original verticies. Thus, those need not be rounded, as they were imported from a data format with double precision. So rounding is only needed for the new vertices created at "non-noded" intersections of original segments. And the constrained triangulation tells you the segments (sides) that are related to this new point. The CT also restricts the list of potential other segments that may may cross the same cell of the new point. Those would be the other segments that the triangles the new point belongs to. So, the rounding would, in most cases, be applied to a smaller number of points, and the slower performance (or lack of scalability) of Snap_rounding_2/index.html would not matter that much.

3) If the rounding is performed just after the creation of the CT, then any "errors" created by the rounding can be handled by the rules of "pprepair".

happy holidays

hugoledoux commented 8 years ago

--> 1) Your original points were perhaps not meant to be so close, but if there are self-intersections for instance because of arcs or inner-rings then these can create very small polygons. For instance, some real-world examples I've gathered over the years:

https://www.dropbox.com/s/hf8tmjbfmcn9aqi/pg_arcsproblems.pdf?dl=0 https://www.dropbox.com/s/tg6i8n7dctyy5an/vlaardigen1.png?dl=0 https://www.dropbox.com/s/lnr68fevkdeh7dm/vlaardigen2.png?dl=0 https://www.dropbox.com/s/u65lief7yrykkd1/selfintersec.pdf?dl=0

Also, if 2 polygons intersects, that creates new vertices and it's these that can be very close to the original ones.

--> 2) Hmmmm, while what you write makes sense, the problem is more complex than this. We can't just move some points, snap_rounding is applied before the CT is constructed. Applying it afterwards is perhaps one option, but moving vertices is tricky. It's one problem we want to work on this year, but it's a complex one that takes time and we don't have money from research grants or others to work on this, so it's always a side-issue that is left when other problems are solved.

--> 3) I thought so too, but in practice it wasn't so simple. As I said, hopefully we can get some time to work on this in 2016. If you want to join forces, we would be happy to share our results!

lucasmation commented 8 years ago

Hey Hugo, I never answered this. I would like to contribute in a more concrete way, i.e., with code, instead of just filling issues. However I lack the CS background and infrastructure to test code and etc. I mean, we can handle most "statistical programming" sotfware (such as R, or use Postgres/postgis) with proeficiency. But I don't know how to develop in C, apart from very basic programming, and how to implement ideas in a fast/efficient way you seem to be concerned about (from a users perspective: efficiency should be a second order consideration. Topological problems are so prevalent, and pprepair's idea so great, that usability (compatibility with shapefiles, poitgis) and roubustnes to errors should be the priorities. I wouldn't care if pprepair ran un 10hrs instead of the current 20min in our 316k polygons).

Let me know if you can think of ways we can contribute more directly and , or how hard would it be to start developing in C constructivelly.

tudelft3d / pprepair

(p)prepair fails on it`s own output polygons (.shp) #35