pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
237 stars 34 forks source link

Control the error handling behaviour #113

Closed rastermanden closed 7 years ago

rastermanden commented 8 years ago

I have a datasource defined something like this:

DROP SERVER IF EXISTS polys CASCADE;
CREATE SERVER polys
   FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (datasource '/vsizip//vsicurl/ftp://myhost/ogrfdw/shapes.zip/shapes/polys.shp',format 'ESRI Shapefile');

CREATE FOREIGN TABLE test.polys (
  fid integer,
  geom geometry(Polygon, 25832)
 )
  SERVER polys
  OPTIONS (layer 'polys');

DROP MATERIALIZED VIEW IF EXISTS test.polys CASCADE;    
CREATE MATERIALIZED VIEW test.polys AS
SELECT *  FROM test.polys ;

--REFRESH MATERIALIZED VIEW  test.polys; 

My problem is that this http endpoint is rather unstable (i guess) because relatively often i get errors like this:

ERROR 1: In file cpl_vsil_gzip.cpp, at line 607, return -1
ERROR 1: In file cpl_vsil_gzip.cpp, at line 753, return -1
ERROR 1: Error in fread() reading object of size 440 at offset 408975792 from .shp file
ERROR 1: In file cpl_vsil_gzip.cpp, at line 607, return -1
ERROR 1: In file cpl_vsil_gzip.cpp, at line 753, return -1
ERROR 1: Error in fread() reading object of size 248 at offset 408976232 from .shp file
More than 1000 errors or warnings have been reported. No more will be reported from now.

I my use case, I only want to refresh my data (REFRESH MATERIALIZED VIEW test.polys) if GDAL can read all the features successfully.

Is it possible to control the behaviour of ogr_fdw when failing to read features from the datasource ?

Ogr2ogr has the flag -skipfailures to control this. Maybe something similar can be implemented in ogr_fdw ?

rastermanden commented 8 years ago

Extrainfo: I tried to emulate the presumably shakey ftp connection in the above example by uploading a shapefile to Amazon S3 where i corrupted the .shx file so that some features where non readable by GDAL. This setup creates rows with NULL values for the corrupted features when read by ogr_fdw.

pramsey commented 7 years ago

What does ogr2ogr do when it tries to read your corrupt data with -skipfailures on? Nicely omit them?

rastermanden commented 7 years ago

I just experienced the same behaviour with ogr2ogr as i did with ogr_fdw without using -skipfailures flag on a local sqlite database.

This command was:

ogr2ogr -f "postgresql" pg:"dbname=udsigt active_schema=myschema" bygning.sqlite bygning -nln bygning  --config PG_USE_COPY YES

FIrst run gave me a feature count of 3359999 and next run (the day after) gave me a feature count of 4622360 and no errors was reported by ogr2ogr.

It's hard to reproduce but i'll try.

Maybe this is strictly a GDAL issue. Maybe @rouault has an explanation ?

rouault commented 7 years ago

Maybe this is strictly a GDAL issue. Maybe @rouault has an explanation ?

So you import a sqlite DB into PG with ogr2ogr, and you don't get the same results between 2 runs, without modifying the sqlite database ? That sounds like a bug (or you have hard disk issues)

rastermanden commented 7 years ago

Yes, different results and no modification of the sqlite file.

Initially, I experienced this using ogr_fdw with a remote (ftp server) shapefile like described above. I figured it was a shaky connection to the ftp server that caused the different feature counts.

So I was bit surprised to see this behaviour again on a local sqlite file.

rouault commented 7 years ago

@rastermanden You should consider running "valgrind ogr2ogr ..." and see if anyting stange shows up.

rastermanden commented 7 years ago

This is the output from Valgrind: output

rouault commented 7 years ago

Looks like Valgrind has issues to running on your Mac, so its output is not meaningful. Somehow I always assume people use Linux :-)