redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.65k stars 589 forks source link

redpanda: tooling for serialized blob inventory validation #3200

Open dotnwat opened 2 years ago

dotnwat commented 2 years ago

In order to create reliable tests for changes to serialized formats that take into account common errors in code refactoring which should instead rely on physical versions of old data when verifying that backwards compatibility is maintained.

  1. some tooling for collecting a small sample of binary blobs corresponding to each release
  2. some interfaces that can be used by tests to load older versions to verify compatibility

JIRA Link: CORE-796

jcsp commented 2 years ago

Generating useful test cases is probably the hardest part (the actual list-of-type, list-of-blobs test is probably straightforward with a macro/template or two).

One example of generating test data is Ceph where we there are generate_test_instances methods for each serializable type (e.g. https://github.com/ceph/ceph/blob/master/src/mds/mdstypes.cc#L192), which was a bit onerous but did at least make it straightforward to generate tests. Might not be so bad if we add those one-by-one while converting types to serde?

If we need to reduce the workload, we could narrow the scope of the blob library to just persisted types, as RP potentialy has to read back data written long ago, whereas network messages it only has to understand one version back (and that compat should be covered by the general upgrade testing).

dotnwat commented 2 years ago

@jcsp I started a separate discussion with some details about a potential framework for this https://github.com/vectorizedio/redpanda/discussions/3221