risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 581 forks source link

bug: failed to parse characters: Invalid UTF-8 sequence #19439

Open fuyufjh opened 3 days ago

fuyufjh commented 3 days ago

Describe the bug

As we discussed over the call, this is one of the issues I am facing while inserting ligatures like æ or ø to a table I am getting following error.

Error message/log

Caused by these errors (recent errors listed first):
  1: Invalid UTF-8 sequence
  2: invalid utf-8 sequence of 1 bytes from index 76

To Reproduce

CREATE TABLE t1
  (
     id      INT PRIMARY KEY,
     name    VARCHAR,
     address TEXT
  ); 

INSERT INTO t1 (id,name,address) VALUES(1,'Thømas', 'Vallanbæk Way');
But if I do this,
INSERT INTO t1 (id,name,address) VALUES(1,'Thømas', 'Vallanbk Strand'); 

removing æ from the value , then the insertion is success. But select * from will give 'o' instead of 'ø',

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

2.0

Additional context

No response

xiangjinwu commented 3 days ago

Cannot reproduce with RisingWave v2.0.1 as server and psql 16.0 as client.

It is likely the client is not sending in UTF-8. Waiting on user response.

dev=> CREATE TABLE t1
dev->   (
dev(>      id      INT PRIMARY KEY,
dev(>      name    VARCHAR,
dev(>      address TEXT
dev(>   ); 
CREATE_TABLE

dev=> INSERT INTO t1 (id,name,address) VALUES(1,'Thømas', 'Vallanbæk Way');
INSERT 0 1
dev=> select * from t1;
 id |  name  |    address    
----+--------+---------------
  1 | Thømas | Vallanbæk Way
(1 row)

dev=> INSERT INTO t1 (id,name,address) VALUES(1,'Thømas', 'Vallanbk Strand'); 
INSERT 0 1
dev=> select * from t1;
 id |  name  |     address     
----+--------+-----------------
  1 | Thømas | Vallanbk Strand
(1 row)
``` select convert_from('\xc3b8c3a6'::bytea, 'utf8'); ```