pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
239 stars 34 forks source link

Trailing blanks are trimmed in varchar #208

Closed noefingway closed 3 years ago

noefingway commented 3 years ago

I have a situation where we use the pgsql-ogr-fdw to connect thru ODBC to an Intersystems CACHE database. Both databases are running on the same Redhat Linux machine, using unixODBC. Some of the records we retrieve have a trailing space which is apparently being trimmed off by pgsql-ogr-fdw.

I verified this by using the isql cmd line tool and connecting to each database and running the same query. In the case of connecting to PG, the query is against an fdw table that is connected to CACHE thru ogr-fdw. Here is the query run thru ODBC directly to CACHE:

[root@chaz-database-machine-prod usr]# isql -v CACHE
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select ('-|' || annotation_xml || '|-') from image_annotation_data WHERE image_ien='26792' AND ien='19'
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Expression_1                                                                                                                                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -|<ARTPage>^\r\n^<Orientation>1</Orientation>^\r\n^<ZOrder></ZOrder>^\r\n^</ARTPage>^\r\n^</ARTDocument>^\r\n^</Page><Page number="4" marks="0"><ARTDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema" |-        

You will note that there is whitespace immediately before the |-.

If I run the same query thru ODBC to PG which then queries CACHE thru the FDW driver I get this:

SQL> select ('-|' || annotation_xml || '|-') from image_annotation_data WHERE image_ien='26792' AND ien='19'
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ?column?                                                                                                                                                                                                                                                                                                    |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -|<ARTPage>^\r\n^<Orientation>1</Orientation>^\r\n^<ZOrder></ZOrder>^\r\n^</ARTPage>^\r\n^</ARTDocument>^\r\n^</Page><Page number="4" marks="0"><ARTDocument xmlns:xsd="http://www.w3.org/2001/XMLSchema"|-       

You will note the whitespace has been trimmed from the end of the string.

In order to correctly reassemble the XML, we need to have the trailing whitespace preserved in the varchar columns when transferred from the source.

pramsey commented 3 years ago

So, first question: can you demonstrate un-trimmed strings coming through just the GDAL/OGR part of the software stack, by doing a query using only OGR? (use ogr2ogr).

noefingway commented 3 years ago

ran this query to get generate a csv file,

ogr2ogr -sql "select ('-|' || annotation_xml || '|-') from cerner_image_annotation_data WHERE image_ien='26792' AND ien='19'"  -f CSV foo.csv "ODBC:CACHE"

got this output:

Expression_1,
"-|<ARTPage>^\r\n^<Orientation>1</Orientation>^\r\n^<ZOrder></ZOrder>^\r\n^</ARTPage>^\r\n^</ARTDocument>^\r\n^</Page><Page number=""4"" marks=""0""><ARTDocument xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" |-"

so it looks like the whitespace at the end is coming thru

pramsey commented 3 years ago

Well, damn.

noefingway commented 3 years ago

my sentiments exactly

pramsey commented 3 years ago

So the trick is going to be finding something I can reproduce and run in a debugger. There's nothing in the code that is obviously trimming. And unfortunately there's nothing in a standard OGR pathway that does it either. This works fine, for example:

CREATE table str (s text);
INSERT INTO str (s) VALUES ('fourafter    ');
INSERT INTO str (s) VALUES ('oneafter ');

CREATE SERVER wraparound
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource 'Pg:dbname=ogr_fdw user=postgres',
    format 'PostgreSQL' );

CREATE FOREIGN TABLE str_fdw (
  s text
)
SERVER wraparound
  OPTIONS (layer 'str');

SELECT '|'||s||'|' FROM str_fdw;

All trailing spaces intact.

    ?column?     
-----------------
 |fourafter    |
 |oneafter |
(2 rows)
noefingway commented 3 years ago

Understood. We spent a good bit of time on this trying to figure out what was going on. Anything I can do to help?

Charles Brault chaz@brault.com

On Jan 22, 2021, at 5:28 PM, Paul Ramsey notifications@github.com wrote:

So the trick is going to be finding something I can reproduce and run in a debugger. There's nothing in the code that is obviously trimming. And unfortunately there's nothing in a standard OGR pathway that does it either. This works fine, for example:

CREATE table str (s text );

INSERT INTO str (s) VALUES ('fourafter ' );

INSERT INTO str (s) VALUES ('oneafter ' );

CREATE SERVER wraparound FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource 'Pg:dbname=ogr_fdw user=postgres' , format 'PostgreSQL' );

CREATE FOREIGN TABLE str_fdw ( s text

) SERVER wraparound OPTIONS (layer 'str' );

SELECT '|'||s||'|' FROM str_fdw; All trailing spaces intact.

?column?     

|fourafter | |oneafter | (2 rows)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pramsey commented 3 years ago

Well, you already have access to a UNIXodbc install, so can you see if a wraparound situation that goes via that way has the trimming behaviour? If it does, that's something I could reproduce here.

noefingway commented 3 years ago

Ok - I'll set that up and let you know. Probably won't be until tomorrow.

Thx!

Charles Brault chaz@brault.com

On Jan 22, 2021, at 6:50 PM, Paul Ramsey notifications@github.com wrote:

Well, you already have access to a UNIXodbc install, so can you see if a wraparound situation that goes via that way has the trimming behaviour? If it does, that's something I could reproduce here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pramsey commented 3 years ago

NP, I've got my fingers crossed that it "works" (aka doesn't work).

noefingway commented 3 years ago

Ok finally have a setup that works.

I'm running this on the same server as the problem occurs, to make sure I'm not introducing something new.

I installed sqlite3 and an odbc driver for it, which I had to build.

Here is the source for the driver: wget http://www.ch-werner.de/sqliteodbc/sqliteodbc-0.9998.tar.gz, unpack somewhere, configure,make install (you know the drill)

Setup the odbcinst.ini:

[SQLite] Description=SQLite ODBC Driver Driver=/usr/local/lib/libsqlite3odbc.so Setup=/usr/local/lib/libsqlite3odbc.so Threading=2

and odbc.ini

[mysqlitedb] Driver=SQLite Description=SQLite test database Database=/home/vagrant/databases/ogrtest.db

optional lock timeout in milliseconds

Timeout=2000 TraceFile = /tmp/sql3.log

I created a database and test table with sqlite3:

[vagrant@chaz-database-machine-prod databases]$ sqlite3 ogrtest.db SQLite version 3.7.17 2013-05-20 00:56:22 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> CREATE table str (s text); sqlite> INSERT INTO str (s) VALUES ('fourafter '); sqlite> INSERT INTO str (s) VALUES ('oneafter '); sqlite> select * from str; fourafter
oneafter sqlite> select '|' || s || '|' from str; |fourafter | |oneafter |

Also made sure the ogrtest.db and path to it were world readable.

Setup the test db in postgres and ran the test:

psql (9.6.20) Type "help" for help.

postgres=# \c ogr_test You are now connected to database "ogr_test" as user "postgres". ogr_test=# \d No relations found. ogr_test=# create server sqlite_fdw foreign data wrapper ogr_fdw options (datasource 'ODBC:mysqlitedb', format 'ODBC'); CREATE SERVER ogr_test=# CREATE FOREIGN TABLE str_fdw ( ogr_test(# s text ogr_test(# ) ogr_test-# SERVER sqlite ogr_test-# OPTIONS (layer 'str'); ERROR: server "sqlite" does not exist ogr_test=# CREATE FOREIGN TABLE str_fdw ( s text ) SERVER sqlite_fdw OPTIONS (layer 'str'); CREATE FOREIGN TABLE ogr_test=# SELECT '|'||s||'|' FROM str_fdw; ?column?

|fourafter| |oneafter| (2 rows)

Note - I tried to use the wraparound but postgres kept croaking with a shared memory error. I assume caused by unixODBC, but I didn't bother to go down that rabbit hole.

Hope this helps!

Thx!

Charles Brault chaz@brault.com

On Jan 22, 2021, at 6:54 PM, Paul Ramsey notifications@github.com wrote:

NP, I've got my fingers crossed that it "works" (aka doesn't work).

On Jan 22, 2021, at 3:53 PM, noefingway notifications@github.com wrote:

Ok - I'll set that up and let you know. Probably won't be until tomorrow.

Thx!

Charles Brault chaz@brault.com

On Jan 22, 2021, at 6:50 PM, Paul Ramsey notifications@github.com wrote:

Well, you already have access to a UNIXodbc install, so can you see if a wraparound situation that goes via that way has the trimming behaviour? If it does, that's something I could reproduce here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pramsey commented 3 years ago

So, getting into a debugger and starting to drill into it, the wrong string value exists down at the OGR level...

  String = 0x00007f8855c0c590 "fourafter"

So question is, what is different between a direct use of the OGR API and whatever you're doing to get a good result from the commandline...

pramsey commented 3 years ago

Aha, skipping the middleman and just debugging

ogrinfo -al "ODBC:thesqlitedb" str  

If I break in OGRFeature::GetFieldAsString and then check the value being read, I see the same thing as I see in ogr_fdw:

  String = 0x0000000106214850 "fourafter"

But when I run

ogrinfo -sql "select s||'p' from str"  "ODBC:thesqlitedb" str

the result I get back is correct.

  s||'p' (String) = fourafter    p

I think the reason for the difference is that in the -sql case, the SQL is being sent all the way to the backend so that the SQLite database is doing the concatenation step, which means there's no trailing space to get trimmed off later. The remaining question is where in the stack is the trimming happening? below generic OGR, I think so probably in either the OGR ODBC driver or in the unixODCB code? @rouault have you seen this before?

pramsey commented 3 years ago

So, I think things are OK up to the point of leaving the unixODBC, because I can demonstrate spaces on the strings in isql, without any functions being applied to the SQL.

% isql thesqlitedb -w 
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> select s from str;
<table BORDER>
<tr BGCOLOR=#000099>
<td>
<font face=Arial,Helvetica><font color=#FFFFFF>
s
</font></font>
</td>
</tr>
<tr>
<td>
<font face=Arial,Helvetica>
fourafter    </font>
</td>
</tr>
<tr>
<td>
<font face=Arial,Helvetica>
oneafter </font>
</td>
</tr>
</table>
pramsey commented 3 years ago

Found it, in the string handling of the OGR ODBC system. It's not optional, it just goes ahead and does it.

https://github.com/OSGeo/gdal/blob/687cfeab298df9524a36846de83ca6b5d8494c6d/gdal/port/cpl_odbc.cpp#L1136

noefingway commented 3 years ago

Wow. Good find - so i didn't plow deeply into that code, but is that trimming CHAR's as well as VARCHAR's?

Charles Brault chaz@brault.com

On Jan 24, 2021, at 6:20 PM, Paul Ramsey notifications@github.com wrote:

Found it, in the string handling of the OGR ODBC system. Doesn't seem like an option, it just goes ahead and does it.

https://github.com/OSGeo/gdal/blob/687cfeab298df9524a36846de83ca6b5d8494c6d/gdal/port/cpl_odbc.cpp#L1136

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pramsey commented 3 years ago

It doesn't distinguish. All the text types get mapped into SQL_C_CHAR as for handling.

        case SQL_CHAR:
        case SQL_VARCHAR:
        case SQL_LONGVARCHAR:
            return SQL_C_CHAR;
noefingway commented 3 years ago

Interesting. So what's the next step?

Charles Brault chaz@brault.com

On Jan 24, 2021, at 6:37 PM, Paul Ramsey notifications@github.com wrote:

It doesn't distinguish. All the text types get mapped into SQL_C_CHAR as for handling.

    case SQL_CHAR:
    case SQL_VARCHAR:
    case SQL_LONGVARCHAR:
        return SQL_C_CHAR;

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

pramsey commented 3 years ago

Well, I'd like to know if @rouault has any feelings about the behaviour. It may well be 20 years old, and nobody knows why it's there. It might be critical to some other thing somewhere else. If it's not, we can either either (a) strip it out or (b) add Yet Another Option to GDAL to make it something folks can turn off at run-time.

In the meanwhile, If you're running a from-source build of GDAL you could do some basic testing just by commenting that block out and making sure all your processes still work as expected.

noefingway commented 3 years ago

Ok - For now I'll deal with it in a source-build. It's more important to our app that we have everything working thru the FDW, adding another source build won't be much of an issue.

Thx for tracking this down. Nice work!

Charles Brault chaz@brault.com

On Jan 24, 2021, at 6:48 PM, Paul Ramsey notifications@github.com wrote:

Well, I'd like to know if @rouault has any feelings about the behaviour. It may well be 20 years old, and nobody knows why it's there. It might be critical to some other thing somewhere else. If it's not, we can either either (a) strip it out or (b) add Yet Another Option to GDAL to make it something folks can turn off at run-time.

In the meanwhile, If you're running a from-source build of GDAL you could do some basic testing just by commenting that block out and making sure all your processes still work as expected.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rouault commented 3 years ago

Well, I'd like to know if @rouault has any feelings about the behaviour. It may well be 20 years old, and nobody knows why it's there

I've no idea why this trimming is down (did you try git blame just in case ?). I'd say just remove that and issue a PR and see if that breaks something in the CI

pramsey commented 3 years ago

I've started this process, here. https://github.com/OSGeo/gdal/pull/3412

pramsey commented 3 years ago

Well, the PR passes all CI tests. Not sure what that really tells us. I guess it tells us that if there's a real problem it solves, it's a problem that pre-dated regular use of the test suite to backstop bug fixes.

pramsey commented 3 years ago

This is committed to the main of GDAL, so I'm closing out this ticket.

noefingway commented 3 years ago

Thanks Paul. We'll update our build for the latest gdal.

Charles Brault chaz@brault.com

On Jan 31, 2021, at 3:40 PM, Paul Ramsey notifications@github.com wrote:

Closed #208.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.