Open yannham opened 2 months ago
(For the record, the explicitly nested version foo = {bar = {baz = "qux"}}
parses correctly, so it really seems to be around dotted field syntax)
Do you have an isolated reproduction case for this?
I can try to make one, yes.
Here's a smallish reproduction, which crashes with "invalid type: string \"bar\", expected a borrowed string". Removing the Spanned
on line 16 makes it succeed.
#!/usr/bin/env -S cargo -Zscript
---cargo
[dependencies]
serde = { version = "1", features = ["derive"] }
serde-untagged = "0.1.6"
toml = "0.8.19"
---
use serde_untagged::UntaggedEnumVisitor;
use serde::de::{Deserializer, MapAccess};
use toml::Spanned;
#[derive(Debug)]
pub enum SpannedValue {
String(String),
Map(Vec<(String, Spanned<SpannedValue>)>)
}
impl<'de> serde::Deserialize<'de> for SpannedValue {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let data = UntaggedEnumVisitor::new()
.string(|str| Ok(SpannedValue::String(str.into())))
.map(|mut map| {
let mut result = Vec::new();
while let Some((k, v)) = map.next_entry()? {
result.push((k, v));
}
Ok(SpannedValue::Map(result))
})
.deserialize(deserializer)?;
Ok(data)
}
}
const INPUT: &str = r#"
[foo.bar]
baz = "qux"
"#;
fn main() {
let val: SpannedValue = toml::from_str(INPUT).unwrap();
dbg!(val);
}
Thanks for the reproduction case!
We have tests for Spanned
being used in arrays, keys, and values, but not in recursive data structures like this. It appears that untagged enums, whether using serde_untagged
or using #[serde(untagged)]
isn't supported at this time.
serde is a bit of a mess to dig into to support cases like this. I personally will likely not get to this for a bit but would be happy with any help on this.
In the meantime, if anyone has the same issue and is looking for a way out, our current work-around is to use the lower-level toml-edit
crate, which works: https://github.com/tweag/nickel/pull/2074.
I'm using
Spanned
to deserialize TOML to Nickel (a configuration language) while preserving spans as much as possible, as Nickel adds validation capabilities and we'd like to link back validation errors to the precise piece of TOML data that failed.To do so, we define a bespoke datastructure that is more or less like a TOML value but with
Spanned
appropriately sprinkled, and write a custom deserializer usingserde_untagged
. You can find the type definition and the deserializer here: https://github.com/tweag/nickel/blob/927ee23993747b7851e51bcfe3eb3e685ba4ebb1/core/src/serialize.rs#L491-L582However, when deserializing the following file:
This gives the following surprising error:
It's surprising because we never try to deserialize a borrowed string: all strings, both as terminal values and keys, are owned in
SpannedValue
(NickelString
is a simple wrapper aroundString
). Also, any TOML file without dotted notation is parsed fine. After some experimentation, it seems that this happens when trying to deserialize the value (and not the key) of the outer map, that is the value associated tofoo
.I suspect that there are some shenanigans around getting the location of the nested map
{bar = {baz = "qux"}}
. It seems that the spanned deserializer oftoml-rs
tries to deserialize markers as borrowed string (https://github.com/toml-rs/toml/blob/b05e8c489be8ebfc0acacc1ec3556d95cd8d2198/crates/serde_spanned/src/spanned.rs#L161) but it also expects a very precise structure, so I'm not entirely sure what's going on here.The issue is that I don't see any easy work-around: once we've tried to deserialize the content of a map as spanned (which is entirely legit for files that don't have the dotted notation), there doesn't seem to be anyway to retry the same deserialization at a different type.