toml-rs / toml

Rust TOML Parser
https://docs.rs/toml
Apache License 2.0
730 stars 107 forks source link

`toml_edit`: improve heuristic when parsing decor #791

Closed teythoon closed 1 month ago

teythoon commented 1 month ago

I noticed that all the decor seems to be ending up in prefix and top-level trailing. This unfortunately messes up the document when removing sections. Consider:

use toml_edit::{DocumentMut, value};

fn main() {
    let toml = r#"
# This is a comment for the file.

# This is a comment for section foo.
[foo]
#bar = "baz" # an example entry for section foo

# This is a comment for section bar.
[bar]
#baz = "bamboozle" # an example entry for section bar
"#;

    let doc = toml.parse::<DocumentMut>().expect("invalid doc");
    eprintln!("{:#?}", doc);

    let mut no_foo = doc.clone();
    no_foo.as_table_mut().remove("foo").unwrap();
    eprintln!("no_foo:\n~~~snip~~~\n{}\n~~~snap~~~", no_foo.to_string());

    let mut no_bar = doc.clone();
    no_bar.as_table_mut().remove("bar").unwrap();
    eprintln!("no_bar:\n~~~snip~~~\n{}\n~~~snap~~~", no_bar.to_string());
}

This prints:

DocumentMut {
    root: Table(
        Table {
            decor: Decor {
                prefix: "default",
                suffix: "default",
            },
            implicit: false,
            dotted: false,
            doc_position: None,
            span: None,
            items: {
                Key {
                    key: "foo",
                    repr: Some(
                        "foo",
                    ),
                    leaf_decor: Decor {
                        prefix: empty,
                        suffix: empty,
                    },
                    dotted_decor: Decor {
                        prefix: empty,
                        suffix: empty,
                    },
                }: Table(
                    Table {
                        decor: Decor {
                            prefix: "\n# This is a comment for the file.\n\n# This is a comment for section foo.\n",
                            suffix: empty,
                        },
                        implicit: false,
                        dotted: false,
                        doc_position: Some(
                            1,
                        ),
                        span: None,
                        items: {},
                    },
                ),
                Key {
                    key: "bar",
                    repr: Some(
                        "bar",
                    ),
                    leaf_decor: Decor {
                        prefix: empty,
                        suffix: empty,
                    },
                    dotted_decor: Decor {
                        prefix: empty,
                        suffix: empty,
                    },
                }: Table(
                    Table {
                        decor: Decor {
                            prefix: "#bar = \"baz\" # an example entry for section foo\n\n# This is a comment for section bar.\n",
                            suffix: empty,
                        },
                        implicit: false,
                        dotted: false,
                        doc_position: Some(
                            2,
                        ),
                        span: None,
                        items: {},
                    },
                ),
            },
        },
    ),
    trailing: "#baz = \"bamboozle\" # an example entry for section bar\n",
}
no_foo:
~~~snip~~~
#bar = "baz" # an example entry for section foo

# This is a comment for section bar.
[bar]
#baz = "bamboozle" # an example entry for section bar

~~~snap~~~
no_bar:
~~~snip~~~

# This is a comment for the file.

# This is a comment for section foo.
[foo]
#baz = "bamboozle" # an example entry for section bar

~~~snap~~~

To be explicit:

I think this could be improved by introducing a heuristic that splits the decor at paragraphs (i.e. \n\n), and associates parts of it with other nodes in the document tree.

epage commented 1 month ago

At this time, I think I'd rather avoid heuristics as we can easily get things wrong. You are welcome to explore these ideas in helpers built on top and we can see what the ecosystem ends up settling on.

teythoon commented 1 month ago

I don't understand. Heuristics can be wrong, but the status quo is always wrong.

epage commented 1 month ago

Heuristics can be hard to predict and don't work as well in APIs.