influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.71k stars 3.54k forks source link

feat: v3 write API with series key #25066

Closed hiltontj closed 3 months ago

hiltontj commented 3 months ago

Closes #25033

Summary

Introduce the experimental series key feature to monolith, along with the new /api/v3/write API which accepts the new line protocol to write to tables containing a series key.

Series key

The series key is supported in the schema::Schema type by the addition of a metadata entry that stores the series key members in their correct order. Writes that are received to v3 tables must have the same series key for every single write.

Series key columns are NOT NULL

Nullability of columns is enforced in the core schema crate based on a column's membership in the series key. So, when building a schema::Schema using schema::SchemaBuilder, the arrow Fields that are injected into the schema will have nullable set to false for columns that are part of the series key, as well as the time column.

The NOT NULL constraint, if you can call it that, is enforced in the buffer (see here) by ensuring there are no gaps in data buffered for series key columns.

Series key columns are still tags

Columns in the series key are annotated as tags in the arrow schema, which for now means that they are stored as Dictionaries. This was done to avoid having to support a new column type for series key columns.

New write API

This PR introduces the new write API, /api/v3/write, which accepts the new v3 line protocol. Currently, the only part of the new line protocol proposed in https://github.com/influxdata/influxdb/issues/24979 that is supported is the series key. New data types are not yet supported for fields.

Split write paths

To support the existing write path alongside the new write path, a new module was set up to perform validation in the influxdb3_write crate (write_buffer/validator.rs). This re-uses the existing write validation logic, and replicates it with needed changes for the new API. I refactored the validation code to use a state machine over a series of nested function calls to help distinguish the fallible validation/update steps from the infallible conversion steps.

The code in that module could potentially be refactored to reduce code duplication.