Open GeoffMontee opened 10 years ago
From the FreeTDS localization page:
"To learn what character set the client wants, FreeTDS prefers the applicable freetds.conf client charset property. If that is not set, it parses the LANG environment variable. In either case, the found string is passed to iconv(3) (or its built-in replacement). [1]. If neither is found, UCS-2 data are converted to ISO 8859-1."
One user said that setting "client charset" in freetds.conf to UTF-8 worked.
Today I got a foreign table returning error related to charset below:
NOTE: DB-Library notice: Msg #: 5701, Msg state: 2, Msg: Changed database context to 'SIARM'., Server: WINDOWS-0CMIFUI, Process: , Line: 1, Level: 0
NOTE: DB-Library notice: Msg #: 5703, Msg state: 1, Msg: Changed language setting to us_english., Server: WINDOWS-0CMIFUI, Process: , Line: 1, Level: 0
ERRO: DB-Library error: DB #: 2403, DB Msg: Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?'), OS #: 0, OS Msg: Sucess, Level: 4
********** Error **********
ERRO: DB-Library error: DB #: 2403, DB Msg: Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?'), OS #: 0, OS Msg: Sucess, Level: 4
SQL state: HV00L
Tried put configuration client charset = UTF-8 on freetds.conf, first on global session then on server session but still same issue. Using tds version = 7.0
Does not message pointing that itself is just an warning? It says that will replace unconvertable chars to question marks.
Hi @sumariva,
I believe that client charset = UTF-8
in freetds.conf should work, as long as tds version = 7.0
(note: not tds vrsion
, which is what you typed up above).
Can you please show the full contents of your freetds.conf?
Yes, this is my freetds.conf
# $Id: freetds.conf,v 1.12 2007/12/25 06:02:36 jklowden Exp $
#
# This file is installed by FreeTDS if no file by the same
# name is found in the installation directory.
#
# For information about the layout of this file and its settings,
# see the freetds.conf manpage "man freetds.conf".
# Global settings are overridden by those in a database
# server specific section
[global]
# TDS protocol version
; tds version = 4.2
# Whether to write a TDSDUMP file for diagnostic purposes
# (setting this to /tmp is insecure on a multi-user system)
; dump file = /tmp/freetds.log
; debug flags = 0xffff
# Command and connection timeouts
; timeout = 10
; connect timeout = 10
# If you get out-of-memory errors, it may mean that your client
# is trying to allocate a huge buffer for a TEXT field.
# Try setting 'text size' to a more reasonable limit
text size = 64512
client charset = UTF-8
# A typical Sybase server
[egServer50]
host = symachine.domain.com
port = 5000
tds version = 5.0
# A typical Microsoft server
[duquedecaxias]
host = 192.168.2.30
port = 1433
tds version = 7.0
client charset = UTF-8
Thanks @sumariva!
The definition of duquedecaxias
looks OK to me.
Can you show me the definition of your foreign server and foreign table please?
Relative to postgresql conversions, there are iconv like functions on strings: http://www.postgresql.org/docs/9.3/static/functions-string.html#FUNCTIONS-STRING-OTHER convert_from will convert to database encoding.
The character set conversions are done by FreeTDS using the iconv library before PostgreSQL sees the strings, so those functions shouldn't be necessary for this.
The configured server
CREATE SERVER siarm
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (servername '192.168.2.30',port '1433');
The table mapping system_information tables, here I also noted a weird caracter ASCII(2) at end of a column
CREATE FOREIGN TABLE sgm.information_schema_columns_tds (
table_catalog varchar
, table_schema varchar
, table_name varchar
, column_name varchar
, ordinal_position integer
, column_default varchar
, is_nullable varchar
, data_type varchar
, character_maximum_length integer
, character_octect_length integer
, numeric_precision integer
, numeric_precision_radix integer
, numeric_scale integer
, datetime_precision integer
, interval_type varchar
, interval_precision integer
, character_set_catalog varchar
, character_set_schema varchar
, character_set_name varchar
, collation_catalog varchar
, collation_schema varchar
, collation_name varchar
, domain_catalog varchar
, domain_schema varchar
, domain_name varchar
, udt_catalog varchar
, udt_schema varchar
, udt_name varchar
, scope_catalog varchar
, scope_schema varchar
, scope_name varchar
, maximun_cardinality integer
, dtd_identifier varchar
/*
, is_self_referencing varchar
, is_identity varchar
, identity_generation varchar
, identity_start varchar
, identity_increment varchar
, identity_maximum varchar
, identity_minimum varchar
, identity_cycle varchar
, is_generated varchar
, generation_expression varchar
, is_updatable varchar */
)
SERVER siarm
OPTIONS ( query 'SELECT * FROM information_schema.columns ORDER BY table_schema, table_name, ordinal_position' )
The table as harvested using the system information table
CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos"
(
"Inscricao" integer NOT NULL,
"ID" smallint NOT NULL,
"Processo" character varying(12) ,
"Historico" text NOT NULL,
"Data" timestamp without time zone ,
"Departamento" smallint ,
"Data_Atualizacao" timestamp without time zone
)
SERVER siarm
OPTIONS (table 'MV_IPTU_Contribuintes_CadastroHistoricos' );
@sumariva,
Your foreign server isn't using the duquedecaxias
server definition in freetds.conf because you have servername
set to 192.168.2.30
. If you want it to use this definition, set servername
to duquedecaxias
.
Otherwise, you should set tds version = 7.0
and client charset = UTF-8
in the [global]
section of freetds.conf. That way, those settings will apply to all connections.
Humm, updated the freetds.conf as suggested
[global]
# TDS protocol version
tds version = 7.0
client charset = UTF-8
[duquedecaxias]
host = 192.168.2.30
port = 1433
tds version = 7.0
client charset = UTF-8
but still with same message. I also checked postgresql client_encoding
SHOW client_encoding
UNICODE
Did you restart the PostgreSQL server after making the change to freetds.conf?
Yes, the postgresql server has been restarted after each change since I do not known when those parameters are reread by the fdw_tds driver. I trying to discover the 2403 error code meaning at mssql documentation.
Maybe your configuration is being read correctly, but one of your rows has non-UTF-8 data.
The table mapping system_information tables, here I also noted a weird caracter ASCII(2) at end of a column
What do you mean by this? Are you saying that ASCII(2)
is the data type of a column on the MS SQL Server side? Or something else?
When I executed a select statement on that system information table(this is a database internal table) I got the following result in psql
table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length | character_octect_length | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | domain_catalog | domain_schema | domain_name | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximun_cardinality | dtd_identifier
---------------+--------------+--------------------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+-------------------------+-------------------+-------------------------+---------------+--------------------+---------------+--------------------+-----------------------+----------------------+--------------------+-------------------+------------------+----------------+----------------+---------------+-------------+-------------+------------+----------+---------------+--------------+------------+---------------------+----------------
SIARM_CAXIAS | dbo | MV_IPTU_Contribuintes_Cadastro | Inscricao | 1 | | NO | int | | | 10 | 10 | 0 | | | | | | | | | | | | | | | | | | | | Y\x02
Note the the \x02 caracter reported at last column named dtd_identifier, what I called ASCII(2). Unfortunely I do not have a mssql client to issue the same query to see if that unicode \x02 is reported also.
Found 2403 error code defined at freetds as an constant called SYBEICONVI define in file https://github.com/FreeTDS/freetds/blob/master/include/sybdb.h. Do not need to catch this "error code" as an warning and allow normal execution?
As far as I can tell, the column dtd_identifier
doesn't exist in information_schema.columns in MS SQL Server:
https://msdn.microsoft.com/en-us/library/ms188348.aspx
This probably means that this column is being populated with whatever random bytes are already in memory when tds_fdw is fetching results. This looks like a new bug, so I've submitted that here:
https://github.com/GeoffMontee/tds_fdw/issues/25
Either way, this seems completely unrelated to any character set issues you are experiencing.
It still sounds to me like you may have data in one of your rows that is incompatible with UTF-8. Maybe try to find out which column is failing? (e.g. maybe try fetching only one varchar column at a time?)
Also, what version of MS SQL Server are you using? If you are using 2008 or later, maybe try tds version = 7.3
:
@sumariva,
I don't know if this warning should be treated in a non-fatal manner without knowing exactly what is going wrong here for you. Can you please try to find out what data is failing? You should be able to find out which varchar
column is failing by creating two new foreign tables:
CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Processo"
(
"Processo" character varying(12)
)
SERVER siarm
OPTIONS (query 'SELECT Processo FROM MV_IPTU_Contribuintes_CadastroHistoricos' );
CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico"
(
"Historico" text NOT NULL
)
SERVER siarm
OPTIONS (query 'SELECT Historico FROM MV_IPTU_Contribuintes_CadastroHistoricos' );
Then you could figure out which column is failing conversion with:
SELECT * FROM cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Processo";
SELECT * FROM cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico";
I followed yours sugestions and found that the failing column is on second query
SELECT * FROM cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico";
beyond 10000 rows fetched.
Digging into MSSQL information_schema.columns table, I found that the type used on column is text(max??) using iso_1 and COLLATION Latin1_General_CI_AI.
The following attempt to translate the character to a binary reprentation failed with an error reported on footer
CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_Historico"
(
"Historico" character varying
)
SERVER siarm
OPTIONS (query 'Select CAST( CAST( Historico AS varchar(max) ) AS varbinary(max) ) AS Historico FROM MV_IPTU_Contribuintes_CadastroHistoricos' );
======= Backtrace: =========
/lib64/libc.so.6(+0x7d4e6)[0x7fa5b5ba54e6]
/usr/lib64/libsybdb.so.5(dbconvert+0x7ae)[0x7fa5b1107fce]
/usr/lib64/postgresql-9.2/lib64/tds_fdw.so(+0x1eaa)[0x7fa5b1361eaa]
If I understood the documentation, when text type is selected, the database stores data on the configured database codepage, that could be an single byte not UCS-2(unicode). Did you known how to tell mssql to convert character on a different codepage on query before freetds receive it?
You might want to try downloading the most recent commit of tds_fdw and then compiling tds_fdw with DEBUG
defined. Then tds_fdw will give you a lot of information about what's going on.
You might also want to consider enabling a FreeTDS log file.
Just for reference I record the server version used:
Microsoft SQL Server 2014 - 12.0.4100.1 (X64)
Apr 20 2015 17:29:27
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: )
Currently trying read of column "Historico"(history) as binary
CREATE FOREIGN TABLE cliente."MV_IPTU_Contribuintes_CadastroHistoricos_HistoricoHex"
("Historico" text NOT NULL)
SERVER siarm
OPTIONS (query 'SELECT convert(varbinary(8000), convert(varchar(8000),Historico), 0) AS Historico FROM MV_IPTU_Contribuintes_CadastroHistoricos');
A simple select on postgresql showed that data has arrived.
select
char_length( "Historico" ), octet_length( "Historico" ), "Historico", encode( "Historico"::bytea, 'hex' )
--, convert_to( "Historico"::bytea, 'iso_8859_1') mssql_varchar_from_varbinary
from
cliente."MV_IPTU_Contribuintes_CadastroHistoricos_HistoricoHex"
limit 100
I had the same problem.
What happened: I'm configured my freetds.conf
with default options and imported from a SQL Server the foreign schema into postgresql.
If I fetch some integer columns,freetds works well but, if I fetch some text columns, the error 2403 happens.
To find a solution, I've tried different charsets and, finally, the client_charset
to CP1252
works fine. Seems a very similar problem that reported by @sumariva
Follow my freetds.conf
(the read-only intent
option was used because my postgresql database reads and refresh some materialized views, I'm not writing on SQL server database.
[totvs]
host = <ip>
port = 1433
tds version = 7.4
database = <dabatabe>
client charset = CP1252
read-only intent yes
It seems that the sql server has data registered with "CP1252" format, so iconv returns an error because the command can't understand the some charset definitions in database. Using "CP1252" the conversion was done correctly and without errors.
Setup info SQL Server version: 2017 - Encoding LATIN1 - Win Postgresql: 10 - Encoding UTF-8 (default) - Ubuntu 18.04 LTS
Currently, tds_fdw can't set the character set for the connection.
The FreeTDS implementation of DB-Library does have the DBSETLCHARSET macro to set the character set:
http://www.freetds.org/reference/a00284.html#ga114
However, it says it doesn't work on TDS 7.0+ connections. This makes it less useful for most versions of MS SQL Server, since those try to use 7.0+ by default:
http://www.freetds.org/userguide/choosingtdsprotocol.htm
The reason it is not supported for 7.0+ is described here:
http://www.freetds.org/userguide/localization.htm
"It is also worth clarifying that TDS 7.0 and above do not accept any specified character set during login, as 4.2 does. A TDS 7.0 login packet uses UCS-2."
There's also the dbsetdefcharset function:
http://www.freetds.org/reference/a00284.html#ga86
But it looks like that might be unimplemented in the current versions of FreeTDS.
Character sets can be set in freetds.conf:
http://www.freetds.org/userguide/freetdsconf.htm
It is probably possible to change character sets with DB-Library in a way that is compatible with versions 7.0+ of TDS also. I should figure out how.
On the PostgreSQL side, I also wonder how BuildTupleFromCStrings will handle character sets with multi-byte characters. I might need to find another way to build tuples.