risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.87k stars 569 forks source link

Refactor: rewrite CDC connector validation to Rust #16902

Open StrikeW opened 4 months ago

StrikeW commented 4 months ago

Currently the validation logic is written in Java, and rely on message passing via protobuf. The interpretation of .proto is different in Rust and Java, causing burden to maintain the connector.

StrikeW commented 4 months ago

For newly supported connector, e.g. #16178 , I suggest we write the validation logic in Rust. @KeXiangWang

BugenZhao commented 4 months ago

The interpretation of .proto is different in Rust and Java, causing burden to maintain the connector.

Can you elaborate more?

StrikeW commented 4 months ago

The interpretation of .proto is different in Rust and Java, causing burden to maintain the connector.

Can you elaborate more?

The oneof field in Rust can be set to None, but after passing to Java it doesn't interpret to Null.

BugenZhao commented 4 months ago

The oneof field in Rust can be set to None, but after passing to Java it doesn't interpret to Null.

Do you mean that calling get on a mismatched case will return the default value? This should be resolved by checking the "case".

https://protobuf.dev/reference/java/java-generated/#oneof-fields

ChoiceCase getChoiceCase(): Returns the enum indicating which field is set. Returns CHOICE_NOT_SET if none of them is set.

StrikeW commented 4 months ago

The oneof field in Rust can be set to None, but after passing to Java it doesn't interpret to Null.

Do you mean that calling get on a mismatched case will return the default value? This should be resolved by checking the "case".

https://protobuf.dev/reference/java/java-generated/#oneof-fields

ChoiceCase getChoiceCase(): Returns the enum indicating which field is set. Returns CHOICE_NOT_SET if none of them is set.

Yeah, the different convention causes burden which is unnecessary. The validation doesn't need to go through the “connector node". But the validation logic usually doesn't need to change after it is stable, so the refactor is in low priority.

StrikeW commented 4 months ago

There is one drawback if we use Rust to write the validate logic: AFAIK Rust doesn't have a unified interface likes the JDBC to access database, so we need to use different Rust driver to interact with different databases. Considering this drawback, it's not really worth the effort to refactor in current stage.

xiangjinwu commented 4 months ago

AFAIK Rust doesn't have a unified interface likes the JDBC to access database

Sort of true ... 😢 It is double-edged rather than a pure drawback in my view. The JDBC unified interface hides certain DB-specific details, leading to us underestimate the complexity in initial implementation and then add continuous patches via its most generic getObject and getColumnTypeName.

Not leaning towards any option. Maybe we need more time to see which one is better.

BugenZhao commented 4 months ago

AFAIK Rust doesn't have a unified interface likes the JDBC to access database

What about sqlx? 🥺

xiangjinwu commented 4 months ago

AFAIK Rust doesn't have a unified interface likes the JDBC to access database

What about sqlx? 🥺

May worth a look but listing something to consider:

github-actions[bot] commented 2 months ago

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄