vibhorkum / pg_background

pg_background
GNU General Public License v3.0
136 stars 35 forks source link

Problem with client encoding for exception messages #37

Closed mclei-asw closed 2 years ago

mclei-asw commented 2 years ago

Using PostgreSQL 13.6 and current pg_background worker, all on Debian 10.

There is a problem with handling exception messages with special characters.

I have database in UTF-8 and I am using Czech special characters. It works when all my clients uses also UTF-8 encoding. But when client uses different encoding and conversion from UTF-8 must be taken, it fails with: ERROR: invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

Testing using:

create or replace function test_bg(a text) returns text
language plpgsql as $$
begin
  raise exception 'This is ěščřžýáíé';
  return 'x';
end; $$;

If I run client with different encoding, then it fails:

#> LANG=cs_CZ.ISO88592 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

The correct behavior is:

#> LANG=cs_CZ.UTF-8 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is ěščřžýáíé
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 21871
vibhorkum commented 2 years ago

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

mclei-asw commented 2 years ago

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

What exactly do you mean by this? I have proper client encoding. Should I setup anything special when using pg_background?

The texts in SQL are in UTF-8. PostgreSQL is responsible for converting it into specified client encoding. It works well for normal connection, but fails when used from inside pg_background.

rjuju commented 2 years ago

I don't think that's the expected behavior, as it's not what happens if calling the function without pg_background. For instance:

=# set client_encoding to 'ISO88592';
SET

=# select test_bg(null);
ERROR:  P0001: This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
LOCATION:  exec_stmt_raise, pl_exec.c:3898

=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8
LOCATION:  report_invalid_encoding, mbutils.c:1669

AFAICS the error message is converted to the client encoding in the bgworker, and then the converted data is converted again when re-throwing it, which definitely cannot work.

vibhorkum commented 2 years ago

No, this is expected behavior. pg_background worker starts a process with all GUCs settings of the current session and keeps the process attached to the session in which the module was called. So, if your session has client_encoding set to ISO88592 background worker will take that same client_encoding for the launched process. This is intentional, see below:

https://github.com/vibhorkum/pg_background/blob/3278c63fb4bd67d6d05108edbee4651684dab7c6/pg_background.c#L822

mclei-asw commented 2 years ago

Sorry, I don't understand you. Expected behavior is to use wrong encoding when run through pg_background and right encoding when run directly? Something smells here.

I think rjuju is right.

vibhorkum commented 2 years ago

No. Expected behavior is to use encoding/GUCs defined in your main session. For example, if you have started a session with client_encoding ISO88592 and you call pg_background, the module will create a process with the same client_encoding ISO88592.

If you want to avoid this, you could either set the server-side encoding in your main session or use the following command: ALTER FUNCTION public.test_bg SET client_encoding TO '<server side encoding>'; to set at the function level Below is an example from your sample code:

edb=# show server_encoding ;
 server_encoding
-----------------
 UTF8
(1 row)
edb=# set client_encoding to 'ISO88592';
SET
edb=# select test_bg(null);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8

edb=# ALTER FUNCTION public.test_bg SET client_encoding TO 'UTF8';
ALTER FUNCTION

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123735
edb=#

Or you could do:

edb=# select result from pg_background_result(pg_background_launch($$set client_encoding TO 'UTF-8'; select test_bg(null)$$)) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123750
vibhorkum commented 2 years ago

Closing this issue. If needed, it can be reopened.

mclei-asw commented 2 years ago

Your solution is still only a workaround. The real problem is with double encoding into client charset.

Unfortunately your workaround does not work, when we want to use procedures with transaction control (commit/rollback), because "SET" clause block this.

From doc: a SET clause is attached to a procedure, then that procedure cannot execute transaction control statements (for example, COMMIT and ROLLBACK, depending on the language). We have also tested the second variant, and it does not work either.