redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

datalake: write bad data as binary records #24016

Closed andrwng closed 1 week ago

andrwng commented 1 week ago

I have an upcoming change to plug schema management into the datalake workers. As a precursor, this change is required to route bad records as binary records, since now it's very easy for data with an unintentional magic 0 byte to land and for Redpanda to mistake it for confluent serde.

Longer term, we should make this expectation for confluent serde more explicit, and perhaps route invalid data to a separate table. But for now this is a stop-gap to get schema support working with the same code that works with non-structured data as well.

Backports Required

Release Notes

vbotbuildovich commented 1 week ago

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57630#0192fe45-a4e8-4454-8876-5d832052bf7d:

"rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.topic_delete_installed_snapshots_test"
vbotbuildovich commented 1 week ago

Retry command for Build#57630

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest.topic_delete_installed_snapshots_test