pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
239 stars 34 forks source link

Problems connecting FDW table to DB2 #143

Closed dwadler closed 6 years ago

dwadler commented 7 years ago

I'm trying to diagnose why this isn't working properly. Attempting to insert into a FDW table connected to DB2 via the GDAL/OGR DB2ODBC support fails with a DB2 SQL error that no value is specified for a column defined as NOT NULL. This happens to be the PRIMARY KEY column of the DB2 table.

A possible symptom of the problem is that ogrinfo on the DB2 table reports FID column = OBJECTID while ogr_fdw_info on the DB2 table (via DB2ODBC) creates a FOREIGN TABLE definition with fid bigint.

From looking at ogr_fdw_common.c, it appears that 'fid bigint' is hardcoded, not using the ogrinfo reported FID column.

However, I don't know that this is actually related to the problem.

When the output is to DB2, the DB2 driver checks whether the layer has a defined FID column and whether poFeature->GetFID() != OGRNullFID is true when deciding to provide a FID value to be inserted. It appears that this test is failing so DB2 doesn't provide a FID value and the SQL fails due to the NOT NULL key column definition.

I'm assuming that the FDW code is creating OGRFeature objects which are then supposed to be written by the DB2 driver.

If a statement like INSERT INTO fdw.mytable(fid, name) VALUES(3, 'Fred') is executed, should the fid value be populated into the OGRFeature object? Does the layer definition have "fid" as the OGR FID column name?

pramsey commented 7 years ago

"fid" is not a column on the OGR feature, it's a property of the OGR feature, that I choose to expose to the ogr_fdw user under the name "fid". When the insert/update/delete statements are executed, I grab the value in the "fid" column and use the OGR OGR_F_SetFID() function to apply it to the feature before executing the OGR insert/update/delete functions. The fact that it's named "fid" on the PgSQL side has no bearing on what happens on the other side of the driver.

dwadler commented 7 years ago

Paul, thank you for the quick response. Your explanation makes sense so I need to figure out why the FID value doesn't seem to be showing up on the DB2 driver side.

I've been working with Nikolai to resolve the problem that he had encountered.

In the meantime, I'm setting up a PostGis environment on my Ubuntu 16.04 environment.

dwadler commented 7 years ago

When using FDW and psql to insert into a foreign table, is there any way to get an OGR trace?

pramsey commented 7 years ago

You can attach a debugger (gdb -p #) to the relevant PgSQL backend process (select pg_backend_pid() to get the process number) and put a breakpoint somewhere you care about.

dwadler commented 7 years ago

The OGR DB2 driver has a lot of debugging calls to CPLDebug - it would be a lot easier to just see these instead of firing up a source debugger. Is there any way to capture the output from the backend process?

pramsey commented 7 years ago

No, nothing easy. I looked into this a little after prior issue, and didn't achieve a satisfactory result https://github.com/pramsey/pgsql-ogr-fdw/compare/gdal-errs

pramsey commented 7 years ago

You might be able to get GDAL to drop error message into stderr in the current state of the code, by passing config options in your CREATE SERVER statement.

CREATE SERVER myserver_latin1
 FOREIGN DATA WRAPPER ogr_fdw
 OPTIONS (
   datasource '/tmp/test',
   format 'ESRI Shapefile',
   config_options 'SHAPE_ENCODING=LATIN1 CPL_DEBUG=ON CPL_LOG_ERRORS=ON' 
);
pramsey commented 7 years ago

If you feel like building and testing #144, that would also be cool. (See the bottom of the README in that branch for instructions).

dwadler commented 7 years ago

For DB2, FDW doesn't seem to like config_options. Or is there something obvious wrong with my syntax? I will try #144.

CREATE SERVER myserver FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource 'DB2ODBC:database=ostest;DSN=OSTEST;Tables=osuser.geojson1', format 'DB2ODBC' ); CREATE FOREIGN TABLE osuser_geojson1 ( fid bigint, shape Geometry(MultiPolygon,4326) ) SERVER myserver OPTIONS (layer 'OSUSER.GEOJSON1', config_options 'CPL_DEBUG=ON CPL_LOG_ERRORS=ON ' ); gisdb=# \i fdw-geojson1.sql psql:fdw-geojson1.sql:6: ERROR: server "myserver" already exists psql:fdw-geojson1.sql:15: ERROR: invalid option "config_options" HINT: Valid options in this context are: layer, updateable

pramsey commented 7 years ago

You perhaps have an ogr_fdw version that predates the 'config_options' option. It's possible you can still finesse it by passing setting up CPL_DEBUG=ON CPL_LOG_ERRORS=ON in the environment you're starting PostgreSQL within.

dwadler commented 7 years ago

It looks like this is in the source code. I'll poke around and see if I can figure anything out. Is it possible that it is picking up FDW executable different from what I built from the GIT source?

user@ubuntu:~/sources/pgsql_ogr_fdw$ grep "config_options" grep: data: Is a directory grep: expected: Is a directory grep: input: Is a directory ogr_fdw.c:#define OPT_CONFIG_OPTIONS "config_options" ogr_fdw.c: const char config_options, const char *open_options) ogr_fdw.c: if ( config_options ) ogr_fdw.c: char *option_list = CSLTokenizeString(config_options); ogr_fdw.c: elog(ERROR, "bad config option string '%s'", config_options); ogr_fdw.c: ogr.config_options = defGetString(def); ogr_fdw.c: ogr.ds = ogrGetDataSource(ogr.ds_str, ogr.dr_str, updateable, ogr.config_options, ogr.open_options); ogr_fdw.c: const char config_options = NULL, open_options = NULL; ogr_fdw.c: config_options = defGetString(def); ogr_fdw.c: ogr_ds = ogrGetDataSource(source, driver, updateable, config_options, open_options); Binary file ogr_fdw_deparse.o matches ogr_fdw.h: char config_options; / GDAL config options / Binary file ogr_fdw.o matches Binary file ogr_fdw.so matches grep: output: Is a directory README.md:The behavior of your GDAL/OGR connection can be altered by passing GDAL config_options to the connection when you set up the server. Most GDAL/OGR drivers have some specific behaviours that are controlled by configuration options. For example, the "ESRI Shapefile" driver includes a SHAPE_ENCODING option that controls the character encoding applied to text data. README.md: config_options 'SHAPE_ENCODING=LATIN1' );

pramsey commented 7 years ago

Ah, no, you applied the config_options to the CREATE FOREIGN TABLE statement, not the CREATE SERVER statement. Config options go on the server.

robe2 commented 7 years ago

@dwadler on your DB2 what datatype is your OBJECTID column , and also which version of GDAL are you using. I recall there was an issue with GDAL 2.1 and below with MSSpatial in that if the primary was say a text field (not an integer), GDAL would recognize it as a primary key FID = the key, but the FID has to be a integer so it did something cooky with it that made it unusable.

Though as I recall the issue didn't exist with the ODBC driver because the ODBC driver was too stupid to figure out primary key so GDAL always generated a fake FID key column. Then again I haven't used the ODBC driver against anything but MSAccess, SQL Server, and Oracle so may behave differently with DB2.

I should also add that the GDAL driver for ODBC for GDAL < 2.0 was impoverished in other ways that it would choke if you happened to have a compound primary key. Though that doesn't seem to be your issue.

dwadler commented 7 years ago

GDAL is 2.2.2 and the OBJECTID is INTEGER.

Inserting using FDW to DB2 using the unixodbc driver is actually working fine in the tests that I've been running but I don't know why it is failing for Nikolai. That is the main reason I want to get the GDAL CPLDebug messages to see why it is failing in his case.

When it is working in my environment, clearly the fid value is being passed as part of the feature structure and it is inserted correctly to the OBJECTID column so there is no error related to NOT NULL primary key problem.

robe2 commented 7 years ago

@dwadler Who or what is Nikolai?

pramsey commented 7 years ago

Original reporter, on #142 I think.

robe2 commented 7 years ago

You know thinking about this more, it does sound like the issue I ran into using the MSSpatial driver which is why I stopped using it and opted for the ODBC driver which is conveniently stupid.

In the case of MSSPatial the FID got mapped to the primary key of the table, so my primary key ended up coming thru ogrFDW via the FID, which was annoying because then I lost the name of the primary key. I never tried updating with it to see if it failed updating.

However if I used the ODBC driver, the ODBC driver was dumb enough not to know what a primary key is, so the FID ended up being a dummy integer (as @pramsey mentioned) and my real primary key came thru with it's real name.

robe2 commented 7 years ago

I should note, if I use the MSSpatial driver, unlike the ODBC one, it does map the primary key to FID.

So my notices with CPL_DEBUG=ON look like

DEBUG: [0] OGR_MSSQLSpatial: Using column ID as FID for table dbo.Edit_Reports. DEBUG: [0] MSSQL: Ignoring FID column vtID as it is of non integer type DEBUG: [0] OGR_MSSQLSpatial: Table Edit_Coding has multiple primary key fields, ignoring them all. DEBUG: [0] OGR_MSSQLSpatial: Using column LIID as FID for table dbo.Edit_LineItems.

and if I do an update such as

UPDATE sspatial.dbo_Edit_lineItems SET chargecode = 'abc' WHERE fid = 194085797;

Seems to work okay:

Query returned successfully: one row affected, 930 msec execution time.

and I see my change when I reselect the table.

So I'm guessing it probably is a bug with the db2odbc driver. MSSpatial works as expected.

robe2 commented 7 years ago

Nah insert works fine for me. Sorry so doing this:

INSERT INTO sspatial.dbo_Edit_LineItems(chargecode, editid, edidetid, chargeamt) VALUES ('abcd', '00002020000000876203', 2417584, 5) RETURNING *

Query returned successfully: one row affected, 430 msec execution time.

Though my fid shows as -1 which is wrong cause if I requery the table, it correctly filled in an fid (presumably generated by SQL Server since the LIID field is an autonumber in SQL Server, but ogrfdw doesn't have the chops to get back the generated id.

I can query the records I inserted fine with

SELECT * from sspatial.dbo_Edit_LineItems where editid = '00002020000000876203'

and I can see fid was assigned for the new record.

pramsey commented 7 years ago

I'm pretty sure I'm doing the right thing: when I don't know the FID, I use OGRNullFID in OGR_F_SetFID(). It may not be a coincidence that OGRNullFID == -1. After the feature is written to the driver, I go back to the feature to see if the driver updated the FID value, and read that back for the return tuple.

https://github.com/pramsey/pgsql-ogr-fdw/blob/c88bdeb0421d3dd4f7cf0cefc3f31bc6b65c4c9d/ogr_fdw.c#L2388-L2397

So, if you go

INSERT INTO foobar (this) VALUES ('that') RETURNING *;

you should see (hopefully) the new fid value in the fid column.

robe2 commented 7 years ago

* will always return the whole row. I can get the subset of the fields by explicitly asking for them with this so at least I can see other fields written but not the fid value autogenerated by SQL Server.

INSERT INTO sspatial.dbo_Edit_LineItems(chargecode, editid, edidetid, chargeamt)
VALUES ('regina123', '00002020000000876203', 2417584, 5)
RETURNING chargecode, editid;

BTW this exercise taught me that the MSSPatial driver is insert/updatable/deletable but the ODBC one evidentally is not or at least not for SQL Server. I guess that's probably expected though puzzled why it just doesn't say doesn't support updates. With the ODBC I get

> DEBUG:  ogrReadColumnData matched 1 FID, 1 GEOM, 8 FIELDS out of 10 PGSQL COLUMNS
DEBUG:  ogrExecForeignInsert

ERROR:  failure writing OGR feature
********** Error **********

ERROR: failure writing OGR feature
SQL state: HV000

and I try to delete say with

delete from ss.dbo_Edit_LineItems 
where editid =  '00002020000000876203' and chargecode = 'regina123';

It for some reason returns 0 rows affected.

I run the same with the msspatial

where editid =  '00002020000000876203' and chargecode = 'regina123'

and I get

> Query returned successfully: 2 rows affected, 800 msec execution time.

as expected.

So I guess if I need to write back data, I'll need to use the MSSPatial driver instead of ODBC. I wonder if that's because ODBC can't figure out the primary key field.

pramsey commented 6 years ago

No news is good news.