mehcode / config-rs

⚙️ Layered configuration system for Rust applications (with strong support for 12-factor applications).
Apache License 2.0
2.43k stars 206 forks source link

Incorrect deserialization with yaml feature and keys containing periods #417

Open Halkcyon opened 1 year ago

Halkcyon commented 1 year ago

When using yaml as a config format, the deserialization incorrectly stops at periods in keys. I do not observe this behavior when manually deserializing the string with serde_yaml or other deserializers (such as ruamel.yaml or pyyaml in Python). As a result, I am getting errors about keys not being present in the mapped value even though they definitely exist.

MCVE:

192.168.1.1:
  an_arr:
    - 1
    - 2
    - 3
  another_value: 10
#[derive(Debug, serde::Deserialize)]
struct Conf {
  an_arr: Vec<u32>,
  another_value: u32,
}

fn main() {
  let settings = config::Config::builder()
    .add_source(config::File::from("config.yml"))
    .build()
    .unwrap();

  println!(
    "{:?}",
    settings
      .try_deserialize::<HashMap<String, Conf>>()
      .unwrap()
  );
}

Result:

The application panicked (crashed).
Message:  called `Result::unwrap()` on an `Err` value: missing field `another_value`
Location: src/main.rs:51

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 9 frames hidden ⋮                               
  10: core::result::Result<T,E>::unwrap::hdf9dbc5c9c26ec5d
      at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/result.rs:1113
  11: ui::main::he2f965d0d6a736c9
      at /home/redacted/src/main.rs:49
        47 │         println!(
        48 │             "{:?}",
        49 >             settings
        50 │                 .try_deserialize::<HashMap<String, Conf>>()
        51 │                 .unwrap()
matthiasbeyer commented 1 year ago

Hi! Thanks for your report. Would you mind filing a patch showcasing this? I would be especially interested in a comparison of config-rs and the plain use of the yaml crate in use by config-rs (serde_yaml IIRC).

Can all be in one testcase, in one file in /tests.

If you cannot spare the time for this, just ping me and I'll do it myself.

Halkcyon commented 1 year ago

I expected serde_yaml, too, but it appears config-rs is using yaml-rust @matthiasbeyer

https://github.com/mehcode/config-rs/blob/f12b93fb949557dd41cbe870179eb9afbb4c3769/Cargo.toml#L20-L35

matthiasbeyer commented 1 year ago

Ah yeah, then that crate of course.

Sorry, I maintain so many projects that I lose track sometimes and I didn't look it up. My bad.

Halkcyon commented 1 year ago

Digging into this deeper, it also appears yaml-rust has been unmaintained since 2020, so it's unlikely to be fixed upstream. I'll submit a patch test a bit later today.

matthiasbeyer commented 1 year ago

Awesome!

So it seems that we should switch YAML implementations at some point. Maybe it is worth doing this even though there's a rewrite ongoing (although slowly, #376) where we could use serde_yaml from the get-go.

... just thinking loud here.

Halkcyon commented 1 year ago

Alright, repro added to tests with rust-yaml vs. serde_yaml @matthiasbeyer

polarathene commented 8 months ago

For reference, the source of the problem has been identified: https://github.com/mehcode/config-rs/pull/418#issuecomment-1770378307

Switching YAML parser won't resolve this.

UPDATE: You can use keys with dots/periods if they're not defined at top-level. Although as referenced below, it's considered a bug (even though it'd seem necessary for the serde rename feature to work correctly to avoid config-rs internally splitting the key into nested tables).


This issue is also a duplicate where the cause was explained:

#[derive(Debug, Deserialize)]
struct Test {
  toplevel: Nested,
}

#[derive(Debug, Deserialize)]
struct Nested {
  #[serde(rename = "192.168.1.1")]
  ip_key: String,
  #[serde(rename = "hello.world")]
  hello_world: String,
  nested: String,
}

Input:

toplevel:
  hello.world: example
  192.168.1.1: a string value
toplevel.nested: valid

Outputs:

Test {
    toplevel: Nested {
        ip_key: "a string value",
        hello_world: "example",
        nested: "valid",
    },
}