vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.58k stars 1.54k forks source link

`#[serde(deny_unknown_fields)]` and nested `#[serde(flatten)]` don't work fine #20234

Open triadium opened 6 months ago

triadium commented 6 months ago

A note for the community

Problem

#[serde(deny_unknown_fields)] helps to avoid errors, but has a big disadvantage for complex cases of extending structures while maintaining backward compatibility.

#[serde(deny_unknown_fields)] and nested #[serde(flatten)] don't work fine. And it should not according to the documentation flatten

Only one level works (?) normally, and on other levels with #[serde(flatten)] does not work.

For example, AmqpSourceConfig has #[serde(deny_unknown_fields)], and AmqpConfig has #[serde(flatten)] .

This combination works until we add an additional flatten structure to expand the settings while maintaining the current definition of the connection string. Configurations with the old definition and with the new one do not work until we comment out #[serde(deny_unknown_fields)] for AmqpSourceConfig

pub(crate) struct AmqpConfig {
...
    // remove the line:  pub(crate) connection_string: String,
    // and add this:
    #[configurable(derived)]
    #[serde(flatten)]
    pub(crate) connection_params: ConnectionParams,
...
}
/// params for amqp connection
#[configurable_component]
#[derive(Clone, Debug)]
#[serde(untagged)]
pub(crate) enum ConnectionParams {
    ConnectionString {
        /// connection_string with all params.
        #[configurable(metadata(
            docs::examples = "amqp://user:password@127.0.0.1:5672/%2f?timeout=10",
        ))]
        connection_string: String,
    },
    ConnectionData {
        #[configurable(derived)]
        connection: ConnectionData,
    },
}

_Is it possible in the next release to remove #[serde(deny_unknown_fields)] for the basic source structures that use the connection string?_

We have a need to extend these structures with fields without prior percent-encoding, in order to then form the correct connection string from them.

On the contrary, you can leave #[serde(deny_unknown_fields)] everywhere and remove #[serde(flatten)] everywhere, breaking backward compatibility now before release 1.0.0 and making configurations more nested, but the structure will never contain unknown fields and will always correspond to structures in the code.

Configuration

### Connection string params must be URL encoded!
[sources.amqp_string]
  type = "amqp"
  queue = "connector"
  auto_reconnect = 5 # retry time in seconds  
  [sources.amqp_string.connection]
    connection_string = "amqp://admin:password$localhost:5672/%2f?timeout=10"
  [sources.amqp_string.queue_auto_create] # new structure
    enabled = true
    exchange = "events"
    routing_key = "storage"
    auto_delete = true
    durable = true

[sources.amqp_struct]
  type = "amqp"
  queue = "connector"
  auto_reconnect = 5 # retry time in seconds
  [sources.amqp_struct.connection] # new structure
    user = "admin"
    password = "password"
    host = "localhost"
    port = 5672
    vhost = "/"
    timeout = 10
  [sources.amqp_struct.queue_auto_create] # new structure
    enabled = true
    exchange = "events"
    routing_key = "storage"
    auto_delete = true
    durable = true

Version

0.37

Debug Output

No response

Example Data

No response

Additional Context

No response

References

Related: #12342

jszwedko commented 5 months ago

Thanks @triadium . I agree we should remove usages of serde(flatten) where possible. It seems fairly buggy when combined with other serde features; in particular deny_unknown_fields which is valuable to provide feedback to users if they make a typo.