Problem with client encoding for exception messages

mclei-asw commented 2 years ago

Using PostgreSQL 13.6 and current pg_background worker, all on Debian 10.

There is a problem with handling exception messages with special characters.

I have database in UTF-8 and I am using Czech special characters. It works when all my clients uses also UTF-8 encoding. But when client uses different encoding and conversion from UTF-8 must be taken, it fails with: ERROR: invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

Testing using:

create or replace function test_bg(a text) returns text
language plpgsql as $$
begin
  raise exception 'This is ěščřžýáíé';
  return 'x';
end; $$;

If I run client with different encoding, then it fails:

#> LANG=cs_CZ.ISO88592 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61

The correct behavior is:

#> LANG=cs_CZ.UTF-8 psql mydatabase
mydatabase=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is ěščřžýáíé
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 21871

vibhorkum commented 2 years ago

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

mclei-asw commented 2 years ago

This is expected behavior. To avoid the issue, you need to have a proper client encoding.

What exactly do you mean by this? I have proper client encoding. Should I setup anything special when using pg_background?

The texts in SQL are in UTF-8. PostgreSQL is responsible for converting it into specified client encoding. It works well for normal connection, but fails when used from inside pg_background.

rjuju commented 2 years ago

I don't think that's the expected behavior, as it's not what happens if calling the function without pg_background. For instance:

=# set client_encoding to 'ISO88592';
SET

=# select test_bg(null);
ERROR:  P0001: This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
LOCATION:  exec_stmt_raise, pl_exec.c:3898

=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8
LOCATION:  report_invalid_encoding, mbutils.c:1669

AFAICS the error message is converted to the client encoding in the bgworker, and then the converted data is converted again when re-throwing it, which definitely cannot work.

vibhorkum commented 2 years ago

No, this is expected behavior. pg_background worker starts a process with all GUCs settings of the current session and keeps the process attached to the session in which the module was called. So, if your session has client_encoding set to ISO88592 background worker will take that same client_encoding for the launched process. This is intentional, see below:

https://github.com/vibhorkum/pg_background/blob/3278c63fb4bd67d6d05108edbee4651684dab7c6/pg_background.c#L822

mclei-asw commented 2 years ago

Sorry, I don't understand you. Expected behavior is to use wrong encoding when run through pg_background and right encoding when run directly? Something smells here.

I think rjuju is right.

vibhorkum commented 2 years ago

No. Expected behavior is to use encoding/GUCs defined in your main session. For example, if you have started a session with client_encoding ISO88592 and you call pg_background, the module will create a process with the same client_encoding ISO88592.

If you want to avoid this, you could either set the server-side encoding in your main session or use the following command: ALTER FUNCTION public.test_bg SET client_encoding TO '<server side encoding>'; to set at the function level Below is an example from your sample code:

edb=# show server_encoding ;
 server_encoding
-----------------
 UTF8
(1 row)
edb=# set client_encoding to 'ISO88592';
SET
edb=# select test_bg(null);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8

edb=# ALTER FUNCTION public.test_bg SET client_encoding TO 'UTF8';
ALTER FUNCTION

edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123735
edb=#

Or you could do:

edb=# select result from pg_background_result(pg_background_launch($$set client_encoding TO 'UTF-8'; select test_bg(null)$$)) as (result text);
ERROR:  This is �������
CONTEXT:  PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123750

vibhorkum commented 2 years ago

Closing this issue. If needed, it can be reopened.

mclei-asw commented 2 years ago

Your solution is still only a workaround. The real problem is with double encoding into client charset.

Unfortunately your workaround does not work, when we want to use procedures with transaction control (commit/rollback), because "SET" clause block this.

From doc: a SET clause is attached to a procedure, then that procedure cannot execute transaction control statements (for example, COMMIT and ROLLBACK, depending on the language). We have also tested the second variant, and it does not work either.

vibhorkum / pg_background

Problem with client encoding for exception messages #37