risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.79k stars 560 forks source link

sqlsmith: Generate multiple input formats: Protobuf, AVRO, JSON #6970

Open kwannoel opened 1 year ago

kwannoel commented 1 year ago

Motivation

Basic Requirements

(Thanks @waruto210 for clarifying with me.)

Full Requirements

Background

Originally posted here: https://github.com/risingwavelabs/risingwave/issues/5164 by @neverchanje :

Source reading/parsing

The parsing part will be more deterministic than the rest. We need to generate random data in a specific format (with a probability of generating false data, the expected behavior is to drop it) and verify the correctness of the parsed output.

Offline discussion with @neverchanje :

The goal is to stabilize of protobuf and avro, which typically have complex nested schema. For testing, we need to ensure that a protobuf file with multiple nested levels and a mix of various data types (including array) can be correctly parsed.

kwannoel commented 1 year ago

~Not sure if right idea... Re-open when have more details.~

kwannoel commented 1 year ago

cc @jon-chuang @neverchanje @tabVersion @waruto210

waruto210 commented 1 year ago

I propose to make datagen connector support generating multiple formats including native chunk, while nexmark only supports native.

kwannoel commented 1 year ago

I propose to make datagen connector support generating multiple formats including native chunk, while nexmark only supports native.

+1 for this.

kwannoel commented 1 year ago

FYI we can do this in next-release. Don't want to block https://github.com/risingwavelabs/risingwave/pull/7612.

fuyufjh commented 1 year ago

cc. @tabVersion

tabVersion commented 1 year ago

FYI we can do this in next-release. Don't want to block https://github.com/risingwavelabs/risingwave/pull/7612.

time to start?

neverchanje commented 1 year ago

Testing is always hard 😢

kwannoel commented 1 year ago

Have quite a bit of backlog at the moment. Don't plan to work on this anytime soon 🤢 maybe after this quarter.