pramsey / pgsql-ogr-fdw

PostgreSQL foreign data wrapper for OGR
MIT License
237 stars 34 forks source link

Encode strings to destination encoding #193

Closed bowguy closed 3 years ago

bowguy commented 3 years ago

With Issue 101 resolved, need to encode strings to database encoding. This might be slow calling GetDatabaseEncoding() every time. Still have issues with extended characters. dest_encoding.txt

bowguy commented 3 years ago

New patch. dest_encoding.txt

bowguy commented 3 years ago

This will not work with wchar_t strings. See https://github.com/pramsey/pgsql-ogr-fdw/issues/101

robe2 commented 3 years ago

Okay I think I got the gdal libiconv sorted out. I still don't know why it didn't work but building a new libiconv (1.16) and new curl (1.73.0) fixed both my missing libiconv and curl dependency. and voila winnie is happy again.

https://winnie.postgis.net:444/job/OGRFDW_PGVersion/606/console


============== running regression test queries        ==============

test file                         ... ok          609 ms

test import                       ... ok         3806 ms

=====================
 All 2 tests passed. 
=====================
bowguy commented 3 years ago

After a lot of digging, I found iconv is extremely helpful but not required. The function I need is CPLRecode() to convert from LATIN1 to UTF8 This is possible without iconv but only has 3 encodings in the GDAL function CPLRecodeStub(). If you compile with libiconv, every encoding is possible that is supported by iconv --list. If I can get the source encoding specified in the create server command via OPTIONS (datasource 'foo@bar', format 'ODBC', open_options 'ENCODING=LATIN1') ,then I need to get the database encoding in GDAL format and conversion is done via: char CPLRecode( const char pszSource, const char pszSrcEncoding, const char pszDstEncoding ); I have it hardcoded now and it is working perfectly.

bowguy commented 3 years ago

Much better solution. When you create the FDW server this will read the options section and look for an ENCODING parameter, i.e.: CREATE SERVER xxx FOREIGN DATA WRAPPER ogr_fdw OPTIONS (datasource 'ODBC:foo@bar', format 'ODBC', open_options 'ENCODING=LATIN1');

Then it will encode the string from the source to the destination encoding. Tested on PG12/GDAL 2.4 To do: Right now the destination is hard coded to UTF8. No error checking for a valid encoding parameter. CPLGetConfigOption("ENCODING", "") is called all the time should be stored in a global to increase speed. encoding_options.txt

bowguy commented 3 years ago

Huge memory leak in the above patch. Looking into it...

pramsey commented 3 years ago

So, I'm not sure that open_options 'ENCODING=LATIN1' does what you think it does, and you're pushing open options into config options too... the open options get passed to the GDAL open routine, and then GDAL does what it needs to do. And if you go searching for that option in the GDAL code base, you'll find a handful of drivers that actually do something with it.

Basically you're just using the open option in GDAL to hold what should be information local the OGR FDW, which means really it should be passed as an FDW option and held in the FDW state, not passed into GDAL which has its own ideas about what to do with encodings.

pramsey commented 3 years ago

Check out #197, @bowguy