risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.06k stars 580 forks source link

bug: panic on schema mismatch when sink into table with array of struct #19454

Open xiangjinwu opened 1 day ago

xiangjinwu commented 1 day ago

Describe the bug

Refer to the produce and error messages below. Still investigating.

Specially, the table's column data types changed after executing sink into table.

Error message/log

thread 'rw-streaming' panicked at src/stream/src/executor/wrapper/schema_check.rs:48:29:
schema check failed on ExecutorInfo { schema: Schema { fields: [id:Varchar, parameters:Struct(StructType { field_names: ["name", "string"], field_types: [Varchar, List(Struct(StructType { field_names: ["value"], field_types: [Varchar] }))] }), _row_id:Serial] }, pk_indices: [], identity: "Project 2C00000000" }: column type mismatched at position 1: expected Some(Struct(StructType { field_names: ["name", "string"], field_types: [Varchar, List(Struct(StructType { field_names: ["value"], field_types: [Varchar] }))] })), found Some("List")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

*** await tree context of current task ***

Actor 44: `CREATE TABLE t ("id" CHARACTER VARYING, "parameters" STRUCT<"name" CHARACTER VARYING, "string" STRUCT<"value" CHARACTER VARYING>[]>)` [67.554s]
  Epoch 7521805286637568 [534.547ms]
    Materialize 2C00000005 [534.547ms]
      RowIdGen 2C00000004 [534.547ms]
        Union 2C00000003 [534.547ms]  <== current
          Merge 2C00000002 [534.547ms]

To Reproduce

CREATE SOURCE s (raw bytea) WITH (connector = 'kafka', kafka.brokers = '127.0.0.1:51349', topic = 'json-any') FORMAT PLAIN ENCODE BYTES;

CREATE MATERIALIZED VIEW parsed AS (SELECT CONVERT_FROM(raw, 'utf8')::JSONB AS payload FROM s);

CREATE TABLE t (
    "id" VARCHAR
    , "parameters" STRUCT<
        "name" VARCHAR
        , "string" STRUCT<
            "value" VARCHAR
        >
    >[]
);

SELECT definition FROM rw_tables;

CREATE SINK sk INTO t AS (SELECT
        payload ->> 'id' AS "id",
        (jsonb_populate_record(NULL::STRUCT<
                parameters STRUCT<
                        "name" CHARACTER VARYING,
                        "string" STRUCT<"value" CHARACTER VARYING>
                >[]
        >, payload)).parameters AS "parameters"
        FROM parsed)
with (
        type='append-only',
);

SELECT definition FROM rw_tables;

Expected behavior

It should NOT panic. The table definition should NOT change as follows:

- CREATE TABLE t ("id" CHARACTER VARYING, "parameters" STRUCT<"name" CHARACTER VARYING, "string" STRUCT<"value" CHARACTER VARYING>>[])
+ CREATE TABLE t ("id" CHARACTER VARYING, "parameters" STRUCT<"name" CHARACTER VARYING, "string" STRUCT<"value" CHARACTER VARYING>[]>)

How did you deploy RisingWave?

./risedev p

The version of RisingWave

v2.0.1

Additional context

14063

xiangjinwu commented 2 hours ago

Seems to be a sqlparser bug on struct & array:

echo "CREATE TABLE t (\"id\" CHARACTER VARYING, \"parameters\" STRUCT<\"name\" CHARACTER VARYING, \"string\" STRUCT<\"value\" CHARACTER VARYING>>[])" | cargo run --bin sqlparser