risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.01k stars 576 forks source link

iceberg sink: struct type with metadata doesn't work #16545

Closed xxchan closed 1 month ago

xxchan commented 6 months ago

Hi, I've recently been thinking about support for Struct type in Iceberg sink, since I'm testing if I can utilise RisingWave at work and such functionality is a necessity. As of now when someone tries to sink struct data to iceberg catalog they receive an error Field response's type not compatible, risingwave converted data type Struct([Field { name: "responseStatus", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "statusCode", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), iceberg's data type: Struct([Field { name: "responseStatus", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"PARQUET:field_id": "17", "column_id": "17"} }, Field { name: "statusCode", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"PARQUET:field_id": "18", "column_id": "18"} }]) Looking at this it seems it is only a matter of a mismatch on metadata field in each Field. The code just does left == right comparison https://github.com/risingwavelabs/risingwave/blob/main/src/connector/src/sink/iceberg/mod.rs#L1046

Is Struct support in Iceberg sink just a matter of lack of correct comparison or is there more context to that?

Slack Message

fuyufjh commented 5 months ago

cc. @chenzl25

chenzl25 commented 5 months ago

@ZENOTME Could you please check whether we support struct type in iceberg sink? IIUC, after this PR #16567 , we could support it directly.

ZENOTME commented 5 months ago

@ZENOTME Could you please check whether we support struct type in iceberg sink? IIUC, after this PR #16567 , we could support it directly.

Sure, I test it later. BTW, there is also no test for struct type in icelake so I am not sure whether it's supported.

github-actions[bot] commented 3 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄

xxchan commented 1 month ago

This is not supported because when comparing the 2 struct types, metadata will be compared. In iceberg, metadata: {"PARQUET:field_id": "18"} will be always present. But in RW we don't have the field id. Even if we have, the id may not match the iceberg one.