toml-rs / toml

Rust TOML Parser
https://docs.rs/toml
Apache License 2.0
725 stars 107 forks source link

Feature Request: get raw repr of key/value when deserializing #761

Closed abonander closed 3 months ago

abonander commented 3 months ago

Background

I'm creating a TOML config parser for SQLx to override various aspects of the library's operation, namely the query macros, as well as migrations. (https://github.com/launchbadge/sqlx/pull/3383)

One big need is to provide a way to override or augment the default SQL -> Rust type mappings in the macros, so I'm adding a section like this to the TOML format:

[macros.type_overrides]
# SQL type `foo` will map to `crate::types::Foo`
foo = "crate::types::Foo"

This is going to be most useful with PostgreSQL, which has a rich type system supporting user-created types (as well as a dynamically loaded extension system which can also add types).

Making it easier to use custom types is one of the main goals of this feature.

Type names in Postgres can be schema-qualified, and to support this in TOML, I'm just allowing type names to be arbitrary paths (using a type with a custom Deserialize implementation to read nested tables):

[macros.type_overrides]
foo.foo = "crate::types::Foo"

This would also make it easier to define multiple types for the same schema, as the user can just create a new table:

[macros.type_overrides.foo]
# foo.foo
foo = "crate::types::Foo"
# foo.bar
bar = "crate::types::Bar"

Problem

In Postgres, both the name of the type as well as the name of the schema can be quoted identifiers, which are case-sensitive whereas normal unquoted identifiers are not. Thus, it is important to distinguish between the two.

From my experience, I know with some certainty that a typical user's first instinct will be to just do this:

[macros.type_overrides]
# *NOT* `foo."foo"` in SQLx (parses as `foo.foo`)
foo."foo" = "crate::types::Foo"

But this will not result in the expected behavior because toml will parse out the quotes, and SQLx won't know the difference.

This is an issue with or without a schema qualifier:

[macros.type_overrides]
# *NOT* `"foo"` in SQLx (parses as `foo`)
"foo" = "crate::types::Foo"

Proposed Solution

I think the most versatile solution would be to add a type analogous to serde_json::value::RawValue.

When parsing, I could use something like this to get the raw key, then decide whether to parse it or use it raw.

A simpler solution would be to just add a specific analogue just for keys, maybe implementing Deserialize for toml_edit::Key itself. That would satisfy my use-case but little else.

I'm willing to put some work into a PR if we agree on a solution here.

Alternatives

Fix It in ~Post~ Documentation

This could easily be written off as a simple documentation issue on SQLx's part. If the key is wrapped in a second pair of quotes, SQLx will see the inner quotes:

[macros.type_overrides]
# `foo."foo"` in SQLx
foo.'"foo"' = "crate::types::Foo"

This is what I'm going with for now so I'm not blocked on this proposal. However, it's not a satisfying solution on its own because there's always users who don't fully read documentation and just go with their gut, then open an issue when things don't work as expected. I'd prefer SQLx to be able to handle this intelligently.

Spanned

While reading the source, I did notice the support for serde_spanned::Spanned which would theoretically allow me to implement this myself.

I could deserialize a key with Spanned<IgnoredAny>, then go back to the TOML string and get the raw key from the given span.

The problem is, how do I thread the TOML string that deep in the call tree?

Also, is Spanned support even part of toml/toml_edit's SemVer guarantees?

Deserialize using the parsed TOML structure

This is a non-starter for the same reasons DeserializeSeed is.

Lint the TOML or Two-Pass Deserializtion

These are basically the same thing and have mostly the same problems.

Alongside the documentation alternative, I could lint for improperly quoted names in the TOML by parsing to a toml_edit::DocumentMut, inspecting the structure and then deserializing the config structure if I don't find any problems.

Alternatively, I could parse to a toml_edit::DocumentMut, deserialize the config structure, then traverse the TOML structure and go back and fix-up quoted names.

Either way, it requires duplicating some of the work that deserialization is doing and having two different places in the code where semantics are defined, which is not great for maintainability.

epage commented 3 months ago

Making the quoting significant makes this no longer TOML but a custom format that looks like TOML. As the use case is something we intentionally do not want to support, I think we'll pass on this feature.

abonander commented 3 months ago

That's disappointing, I was hoping for at least some back-and-forth.

It wouldn't have to be in toml, it could be in toml_edit, the whole point of which, from my perspective, is to treat all syntax as significant.

But you're the boss.

abonander commented 3 months ago

This could also be used, like that in serde_json, to extract a subsection of a TOML file and re-serialized it verbatim. I could see that as being very useful.

abonander commented 3 months ago

@epage could you at least please answer this question?

is Spanned support even part of toml/toml_edit's SemVer guarantees?

epage commented 3 months ago

Yes, it is