tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Schema Evolution #125

Open dominiklohmann opened 7 months ago

dominiklohmann commented 7 months ago

Schemas change over time. Especially with schema inference, we often end up with multiple schemas of the same name that are actually different under the hood.

We have the schema id to filter this out right now, but that just shifts the burden onto the user. Instead, we want to transparently cast events on access of a partition to a superset schema for all schemas of the same name.

### Definition of Done
- [ ] Agree on the approach
- [ ] Implement the required changes
dominiklohmann commented 7 months ago

This was (indirectly) requested by a customer alongside tenzir/public-roadmap#123—due to schema inference in the JSON parser, they have a lot of ever so slightly different schemas for Suricata, which on their end is configured not to write null values. This makes it impossible to merge partitions, thus slowing down exports from the node because of the suboptimal partition sizes.