tudelft3d / pprepair

Validation and Automatic Repair of Planar Partitions
GNU General Public License v3.0
57 stars 22 forks source link

pprepair output: invalid polygons returns points #27

Closed lucasmation closed 8 years ago

lucasmation commented 8 years ago

pprepair is returning invalid polygons when we use it on the Brazilian enumeration districts shapefiles. In pacticular, some polygons returned are indeed points.

This hapens, for example, on the state of Ceara (CE, code 23, data available at ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/ce/ce_setores_censitarios.zip ). After pprepair, the polygon cd_geocodi='230040810000012' becomes two elements of the pprepair output file: a regular corrected polygon and a point. The same also hapens with cd_geocodi='230160415000004'.

The script bellow describes steps to reproduce the errors. After downloading the CE data, we ran prepair and pprepair :

prepair --ogr 23SEE250GC_SIR.shp --shpOut 23SEE250GC_SIR_1.shp
pprepair -i "23SEE250GC_SIR_1.shp" -r fix -o 

Then we imported the output shapefile into PostGis and ran some tests (we tryed runing test directly using prepair on the shapefile outputed by pprepair. However it did not work becuase multipolygons get divided into different lines, making prepair err).

shp2pgsql -s 4674:4326 -I -W LATIN1 -g geom 23SEE250GC_SIR_1.r  SC_table | psql -h server -d database 

At first, all elments of the shapefile are "multipolygons" but PostGis indicates this is an invalid shapefile.

select count(*), GeometryType(geom) from SC_table group by GeometryType(geom) 
# all are multiplolygons
select count(*), IsValid(geom) from SC_table group by IsValid(geom)
# we find 248 invialid polygons

We then run ST_makeValid() on the polygons and count the frequencies of greometry types again:

select count(*), GeometryType(ST_MakeValid(geom)) from SC_table
group by GeometryType(ST_MakeValid(geom))
# we now have MULTIPOINT ou um MULTILINESTRING elements

Above, ST_makeValid() coerces some elements to points and lines. For example:

select * from SC_table where (cd_geocodi='230040810000012' or cd_geocodi='230160415000004') 

We are using a Linux machine, prepair-improvements-with-ogr and pprepair-new

kenohori commented 8 years ago

I just checked the two polygons that you mention and I don't think there are any points returned. For me, the result is a single polygon in both cases. Just to be safe, I checked it with prepair, which also shows it to be a single valid polygon.

In fact, I don't think we can return any points. The shapefile spec doesn't allow us to return a mix of points/lines/polygons. Perhaps this is caused by shp2pgsql or ST_MakeValid?

I have uploaded a fixed dataset for Ceará together with the others. Could you test if you still get points with this one?

lucasmation commented 8 years ago

The output from pprepair differs from yours and ours version on the Ceará file. Looking manually at the dbf file (so we control for import error), our's had two lines with cd_geocodi='230040810000012'. On your file there is only one. So that is good (although it makes me woried of all the other files we ran).

So we repeated the baterry of tests (import to postgis and run ST_valid(), etc) on the Ceará file you sent us. We now find other polygons to be invalid. this table sumarizes the number of polygons considered invalid by PostGis:

#table code:
select count(*),GeometryType(ST_MakeValid(geom)),reason(ST_IsValidDetail(geom)) from setores_censitario_ce_autor
where IsValid(geom) = FALSE
group by GeometryType(ST_MakeValid(geom)),reason(ST_IsValidDetail(geom))
geometrytype num_of_polys reason
GeometryColection 84 Self-Intersection
Multipolygon 6 Self-Intersectoin
Multipoint 1 Too few points in geometry component
Multipolygon 1 Ring Self-intersection
Multilinestring 26 Too few points in geometry component

The Multipoint is cd_geocodi="230950805000021" The Multilinestring is cd_geocodi="230030933000002" (there are 26, this is just one of them)

I haven´t investigated the ring-self intersections yet

kenohori commented 8 years ago

Are you using the same CRS from the shapefiles (SIRGAS 2000) in PostGIS? That could result in small shifts that create these errors.

lucasmation commented 8 years ago

is CRS the same as SRID? Upon importing we were reprojecting from SIRGAS 2000 (SRID=4674) to WGS84 (SRID=4326), that was done by the -s 4674:4326 switch in shp2pgsql. I now reimported CE file you prepared with only with -s 4674, which only declares the SIRD explicitly (without reprojection) and also without the -s switch. Then re-ran the tests. In both cases I get the same table of invalid polygons as above.

lucasmation commented 8 years ago

@kenohori, here is the table of errors, updated with a few examples in case you have time to check them yourself

geometrytype N reason Example_cd_geocodi
GEOMETRYCOLLECTION 87 Self-intersection "230400405000013", "230100017000001", "230015005000009", "230860915000003"
MULTIPOLYGON 6 Self-intersection "230380850000003", "230205705000013", "230870835000005", "230240407000002"
MULTIPOINT 1 Too few points in geometry component "230950805000021"
MULTIPOLYGON 1 Ring Self-intersection "230725405000005"
MULTILINESTRING 26 Too few points in geometry component "230030933000002", "230020005000020", "230240405000056", "230560505000021"
hugoledoux commented 8 years ago

same as #35