Closed mclei-asw closed 2 years ago
This is expected behavior. To avoid the issue, you need to have a proper client encoding.
This is expected behavior. To avoid the issue, you need to have a proper client encoding.
What exactly do you mean by this? I have proper client encoding. Should I setup anything special when using pg_background?
The texts in SQL are in UTF-8. PostgreSQL is responsible for converting it into specified client encoding. It works well for normal connection, but fails when used from inside pg_background.
I don't think that's the expected behavior, as it's not what happens if calling the function without pg_background. For instance:
=# set client_encoding to 'ISO88592';
SET
=# select test_bg(null);
ERROR: P0001: This is �������
CONTEXT: PL/pgSQL function test_bg(text) line 3 at RAISE
LOCATION: exec_stmt_raise, pl_exec.c:3898
=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR: 22021: invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8
LOCATION: report_invalid_encoding, mbutils.c:1669
AFAICS the error message is converted to the client encoding in the bgworker, and then the converted data is converted again when re-throwing it, which definitely cannot work.
No, this is expected behavior. pg_background worker starts a process with all GUCs settings of the current session and keeps the process attached to the session in which the module was called. So, if your session has client_encoding set to ISO88592
background worker will take that same client_encoding for the launched process. This is intentional, see below:
Sorry, I don't understand you. Expected behavior is to use wrong encoding when run through pg_background and right encoding when run directly? Something smells here.
I think rjuju is right.
No. Expected behavior is to use encoding/GUCs defined in your main session. For example, if you have started a session with client_encoding ISO88592
and you call pg_background, the module will create a process with the same client_encoding ISO88592
.
If you want to avoid this, you could either set the server-side encoding in your main session or use the following command:
ALTER FUNCTION public.test_bg SET client_encoding TO '<server side encoding>';
to set at the function level
Below is an example from your sample code:
edb=# show server_encoding ;
server_encoding
-----------------
UTF8
(1 row)
edb=# set client_encoding to 'ISO88592';
SET
edb=# select test_bg(null);
ERROR: This is �������
CONTEXT: PL/pgSQL function test_bg(text) line 3 at RAISE
edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR: invalid byte sequence for encoding "UTF8": 0xec 0xb9 0xe8
edb=# ALTER FUNCTION public.test_bg SET client_encoding TO 'UTF8';
ALTER FUNCTION
edb=# select result from pg_background_result(pg_background_launch('select test_bg(null)')) as (result text);
ERROR: This is �������
CONTEXT: PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123735
edb=#
Or you could do:
edb=# select result from pg_background_result(pg_background_launch($$set client_encoding TO 'UTF-8'; select test_bg(null)$$)) as (result text);
ERROR: This is �������
CONTEXT: PL/pgSQL function test_bg(text) line 3 at RAISE
background worker, pid 123750
Closing this issue. If needed, it can be reopened.
Your solution is still only a workaround. The real problem is with double encoding into client charset.
Unfortunately your workaround does not work, when we want to use procedures with transaction control (commit/rollback), because "SET" clause block this.
From doc: a SET clause is attached to a procedure, then that procedure cannot execute transaction control statements (for example, COMMIT and ROLLBACK, depending on the language). We have also tested the second variant, and it does not work either.
Using PostgreSQL 13.6 and current pg_background worker, all on Debian 10.
There is a problem with handling exception messages with special characters.
I have database in UTF-8 and I am using Czech special characters. It works when all my clients uses also UTF-8 encoding. But when client uses different encoding and conversion from UTF-8 must be taken, it fails with:
ERROR: invalid byte sequence for encoding "UTF8": 0xe8 0x6b 0x61
Testing using:
If I run client with different encoding, then it fails:
The correct behavior is: