toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.31k stars 845 forks source link

Add a none type #921

Closed Artemis21 closed 1 year ago

Artemis21 commented 1 year ago

This is another issue asking for a null/nil/none/empty/nothing type. Summary of prior discussion:

  1. 30

    Arguments for:

    • Distinguish between an empty table and a table with empty keys.
    • Express "select where x = null or x = 1" (via x = [null, 1]) as opposed to "select where x = 1" (via x = [1]).
    • Specify null for a config option even when the default value is not null.

    Arguments against:

    • In a configuration context, you can just leave the keys out.
    • You can use an empty table ({}) or boolean false instead.
  2. 146

    Arguments for:

    • Ability to have self documenting config files which list out every option, even null ones.
    • Overrides where an unset value means "do not override" but null means "override with null".

    Arguments against:

    • Null as a valid value for any type breaks guarantees - an integer is now an integer or null.
    • Static languages find it hard to differentiate between non-existence and null.
    • Null as the billion dollar mistake.

    Other proposals:

    • Explicit syntax for not setting a key, equivalent to leaving it out.
  3. 802

    Little new ground covered.

  4. 803

    Arguments for:

    • Inheritance/layered environments, similar to overrides above.
    • Compatibility with 3rd party options which support null.

    Arguments against:

    • As above, sentinel values can be used.
    • TOML isn't meant to deal with legacy apps.

I agree that a null value, which would be valid for any type, is a bad idea. It is a bad idea because it would create a new option for every existing type, break existing guarantees and complicate implementations.

However, a new type would not do this. Defining a new unit type, called none, where the only valid value is none, would not break any existing guarantees. Implementations would just have to handle one more very simple type.

In order to express nullable options, a union/sum type of none and another type would be used. Union types are already possible in TOML since it is not statically typed - in fact, the type of any TOML value is the sum of all types by default.

This solution comes from functional languages, and would be familiar to users of those languages, as well as Rust (functional inspired) and Python (although not statically typed, Python's None is a type of its own, and not a value of any other type). The TOML syntax would not be visually much different to null as used in other languages.

For me the most compelling use cases are inheritance/layered environments (where a child might set to none what a parent set to something else), and the ability to provide sensible non-null defaults while still allowing the value to be set null in the config file.

Points this does not address:

eksortso commented 1 year ago

First of all, you have done plenty of good research into the realm of null values and why we have rejected them for TOML in the past. You're coming into this from a good place, and I admire that.

That said, I am definitely biased against them. NULLs are not welcome here.

So I too have been tempted to suggest an explicit null value type in TOML consisting of a singleton NULL (in all caps, to highlight the terrible mistake that we'd be making), which would be distinct from all other values and equivalent to None in Python or the NULL pointer in C. But as you have read elsewhere, allowing for the instantiation of NULL brings more trouble than it's worth. (And null values bring even more trouble in SQL, because absolutely nothing can = NULL in a good SQL database, not even NULL.)

The only thing I can think of that a NULL would be truly useful for is to emulate JSON's null, and even then, what's the point of serializing an explicitly acknowledged absence of data when TOML is not a serialization language? Maybe an explicit return_blanks = true or override = true would be more sensible for a human-readable configuration. Or maybe you can make it simpler and more obvious with the use of NULL in well-documented places.

So to save time, I implore you to impress us further, to make this effort worth our while. Provide us with a detailed use case or two that cannot be handled more simply than with an explicit and distinct null value type. You are raising this topic anew, so if null values in TOML would really make the concepts that you deal with a whole lot simpler, then show us what you're dealing with, and how it can be done better. Maybe putting NULL in TOML would be worth all its troubles.

But expect some pushback. No bad idea gets adopted without a fight.

Artemis21 commented 1 year ago

For example, say I am configuring a web server. There is an option called max_upload_size, with some sensible, safe default. But I know that my web server will only be accessible to trusted clients, or I have a reverse proxy in front of it and the upload limit is set there. So, I want to disable the upload limit in the web server. To me, max_upload_size = none is the most obvious way of writing this.

marzer commented 1 year ago

max_upload_size = 0 would be equally sensible.

eksortso commented 1 year ago

# max_upload_size = disabled by default

Artemis21 commented 1 year ago

max_upload_size = 0 would be equally sensible.

It is always possible to find a sentinel value, I just don't think it's an especially tidy solution. I appreciate that the work involved in updating all TOML parsers may outweigh "not especially tidy".

# max_upload_size = disabled by default

I don't really consider this an option, unlimited upload sizes cannot be the default, things should be secure by default.

eksortso commented 1 year ago

max_upload_size = 0 would be equally sensible.

It is always possible to find a sentinel value, I just don't think it's an especially tidy solution.

It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1, would carry the same significance.

Skipping ahead...

# max_upload_size = disabled by default

I don't really consider this an option, unlimited upload sizes cannot be the default, things should be secure by default.

If an option is not set (or is only commented out), then the config's consumer must handle its absence, no matter what. So the server cannot allow unlimited upload sizes by default (and I should not have been so naive to accept unlimited as a default). Nevertheless, a default exists. A large default, but still significant.

But default max values can be exceeded. max_upload_size = 9_223_372_036_854_775_807 means effectively the same thing as disabling the max. Any ridiculously large value would mean the same thing.

So:

That last one is the only viable excuse for introducing a null value. And like all non-numerics, it would require special handling, by both the parser and the consumer, should we allow it.

Which brings us back to this:

I appreciate that the work involved in updating all TOML parsers may outweigh "not especially tidy".

This is going to be a consideration that must be made, no matter what. It cannot be ignored. And so far, it still cannot be justified.

Artemis21 commented 1 year ago

It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1, would carry the same significance.

I disagree, the purpose of zero is not obvious - it could mean "disallow all uploads" for example. A ridiculously large value would be less ambiguous, but feels like a clear symptom that something is missing.

NULL (I refuse to call it "none")

I use "none" to distinguish the concept of "a single value of a separate unit type" from the concept of "a single value considered part of any type" ("null"). I am not attached to the naming.

arp242 commented 1 year ago

It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1, would carry the same significance.

I disagree, the purpose of zero is not obvious - it could mean "disallow all uploads" for example. A ridiculously large value would be less ambiguous, but feels like a clear symptom that something is missing.

This is kind of the problem with None/Null values: it only signals one sentinel value and you can often have multiple "obvious" ones (here: "disallow uploads" and "no limit on upload size" both seem obvious). Plus it's not really obvious to me what "none" would mean here either; it's definitely something you'd have to document and the difference between documenting "none" or "-1" is essentially zero.

Since TOML keys aren't typed, it seems to me that using a string value would be best if you want it to be obvious without documentation.


Speaking purely from an implementation point of view: implementing None/Null in Go is kind of a pain since there is no value to represent them (so you need a custom type).

ChristianSi commented 1 year ago

Fortunately, TOML has already a perfect solution for the use case in question:

max_upload_size = inf

No new, error-prone, type required.

Artemis21 commented 1 year ago

inf is a float not an int so it would be another sentinel value of a separate type. But the matter of whether sentinel values are good enough to make null/none unnecessary is eventually just opinion, and there seem to be plenty of people who believe that it is good enough here.

pradyunsg commented 1 year ago

So... This entire discussion so far is basically a repeat of what has already been discussed in the four issues that have been linked to by OP. I'll note that I genuinely appreciate that OP spent their time and effort researching and reading the past discussions on this topic.

Defining a new unit type, called none, where the only valid value is none, would not break any existing guarantees.

I don't see how this is different from what has already been proposed, other than a different spelling.


My position is the same as it was back in https://github.com/toml-lang/toml/issues/803 -- I'm not sure this is a problem that needs solving in TOML.

There are alternatives available in basically all realistic usecases discussed so far (here and in the older threads). As mentioned, having a sane default on the reader of the configuration or setting other non-null values like = "no limit" or = false is almost always a better option, and things like round-tripping / representing arbitrary data aren't something TOML should try to be good at.

I'm gonna close this to reflect that it's unlikely that this will change; but please feel welcome to continue the discussion.

C-Ezra-M commented 1 year ago

I think that the billion-dollar mistake that ALGOL's null reference was should not affect this proposal because it's more about the null pointer, not null value.

The null pointer is not memory-safe, and that's why Rust has the Option<T> type, which is a memory-safe enum, and provides a None variant. Most other languages also have a memory-safe null counterpart: Java, JS, and PHP have null, Python has None, Ruby and Lua have nil, etc.

eksortso commented 1 year ago

@Keyacom I can appreciate your interest in all the different concepts that TOML has approached or avoided over the past decade. But over multiple issue threads, you keep saying "let's add this thing" without offering compelling use cases for adding those things. We really don't want to push any syntax, type, concept, or redundancy into all the parsers and other tools that implement the standard if we can't make it obvious why we're doing so.

So permit me to ask, what are you using TOML for? Why won't sentinels work in your particular situation? How could null address the problems that you're facing? Are they new or overlooked problems, or long-standing problems that we have considered before?

Please give us some details! What are we overlooking?

tgross35 commented 1 year ago

Just my 2¢ - I have come across the desire for this a handful of times when I want to have some sort of "configuration precedence". That is, something like Application Defaults -> RepoConfig.toml -> ProjectConfig.toml. If I set a value for RepoConfig.toml then wish to "unset" it in the project configuration, there isn't really a good way to do that.

It is possible to set it to the default value, but that doesn't always make sense when the default value is nothing - that is, the value being unspecified carries some natural meaning that is different from any value being specified. I think it's better to think of it as an extension to the existing type that has a single purpose, rather than being something completely different. That is, a value may be a string, a bool, a number, an object, or it can just be nothing at all.

Sentinels aren't really a good solution in modern programming. C doesn't have a way to represent optional values, so returning -1 is common to represent an error. But Python has None, SQL has NULL, Rust has Option<T> (and a wonderful set of associated functions to work with it), JSON and YAML have null, JS has both none and undefined, C# has Nullable<T>, Go has nil, and even C++ now has std::optional - I think this indicates that there are real use cases for "maybe exists" values, which in cases may carry over to configuration.

Additionally, using magic values isn't good for future proofing (what if -1 is a sentinels, but later on negative numbers become possible values?) and they suffer from being implicit rather than explicit (read the documentation if you want to understand what this means vs. a clear meaning to anyone who looks at the file - which is part of the point of TOML).

Then there's the inconsistency aspect, that multiple keys need to use different ways to represent nothing. From least to most absurd:

only-string = -1
string-or-int = inf
number-or-string = false
bool-or-array-or-number = {}
bool-or-number-or-string-or-array-or-object = 00:00:00

Obviously that's not realistic, but it isn't uncommon for larger schemas to need >1 way to represent the same nothing, and that's just confusing to understand.

(Random, but I personally think the word empty fits better into the feel of TOML than something like null or none)

name = "Orange"
physical.color = empty
physical.shape = "round"
site."google.com" = true
tgross35 commented 1 year ago

To add a more concrete example - per the docker-compose schema there are 7 instances where null is allowed. I believe all these cases are for overriding inherited configuration, which lines up with the use cases I've run into myself.

C-Ezra-M commented 1 year ago

Sentinels aren't really a good solution in modern programming. C doesn't have a way to represent optional values, so returning -1 is common to represent an error. But Python has None, SQL has NULL, Rust has Option<T> (and a wonderful set of associated functions to work with it), JSON and YAML have null, JS has both none and undefined, C# has Nullable<T>, Go has nil, and even C++ now has std::optional - I think this indicates that there are real use cases for "maybe exists" values, which in cases may carry over to configuration.

C has a NULL pointer, but derefing NULL of course causes memory issues. JS does not have none, but null.

SQL NULL is a bit different concept because it's a marker, not a pointer or a value, but I see potential use for null in TOML as SQL NULL in database generation files. I've made a SQL file generator in Python for a bot I'm planning to develop in the future, in case someone else wants to deploy even a local instance. I made one because it's easier to maintain the TOML file, rather than the SQL file (especially with schema modifications), which is hinted by having the SQL file generated, like MediaWiki does from tables.json (in MW, it's also because of need to support multiple RDBMS software, but my bot's DB specifically targets PostgreSQL). I used TOML because I think it's easier to maintain than JSON. The generator that I wrote is expected to either use the given value or NULL in the default key for each column spec. NULL could also be explicitly set as the default in the TOML file used to generate the SQL, but I'm only planning on that because, of course, TOML does not have null yet.

Regardless, we should probably reopen this issue (i.e. lift the Closed marking).

marzer commented 1 year ago

@Keyacom

JS does not have none, but null.

This is unhelpful pedantry and does not at all meaningfully contribute to the discussion.

Regardless, we should probably reopen this issue (i.e. lift the Closed marking).

I disagree. Your situation alone doesn't really offer a compelling argument. You're suggesting that because

the language (and its many implementations) should change, and not that you've made a poor design choice?

If you genuinely think you have a good, novel case to make, then by all means open a new issue, but you should realize that the first one linked in this discussion, #30, is now ten years old. There's likely to be very little in this area that hasn't already been discussed to death.

eksortso commented 1 year ago

@Keyacom

JS does not have none, but null.

This is unhelpful pedantry and does not at all meaningfully contribute to the discussion.

Ease up a bit. Let's not split hairs. We're all talking about the same thing: a type distinct from all others, consisting of a singular value whose references are all identical, intended to express the deliberate absence of a more meaningful value. That's NULL in a nutshell. (Much hairier in SQL, but the point's made.)

I'm still against it, but I know what it is, and how it's used elsewhere.

Regardless, we should probably reopen this issue (i.e. lift the Closed marking).

I disagree. Your situation alone doesn't really offer a compelling argument. You're suggesting that because

  • SQL has nulls
  • TOML does not
  • You've chosen to pair those together

the language (and its many implementations) should change, and not that you've made a poor design choice?

@marzer I agree with you 100% here, specifically on the principles of good configuration design and practice that make settings orthogonal to each other and don't require any sort of hacky "unsetting" of key values. In a perfect world, this would be standard. But nobody's perfect.

@tgross35 You mentioned that Docker (I forget the name of the package) allowed the use of nulls in its JSON-based configurations to clear a value set at an intermediate level. I'm still asking for a detailed use case where a null would need to be used at the user's level. Since you've seen them a lot, you would be an ideal candidate to flesh out the details of such a use case and describe what we may have overlooked. Which may still be a bad design choice, but its legacy would be hard to get away from.

If you genuinely think you have a good, novel case to make, then by all means open a new issue, but you should realize that the first one linked in this discussion, #30, is now ten years old. There's likely to be very little in this area that hasn't already been discussed to death.

@pradyunsg Bringing you back into this, since a new type of document for the project is in order. We ought to put together something like Python PEPs to describe recurring issues and condense the dozens of conversations we've had over the years about them. And null-type proposals are a prime candidate for such a document, containing rationales, alternatives, primary candidate proposals, and ultimately decisions made. Having one would save everybody a lot of time and effort.

Wouldn't you love to point to a single all-encompassing document and say, "Here's why we don't do nulls, or why we haven't done them"? At this point, I sure would.

marzer commented 1 year ago

@eksortso

Ease up a bit. Let's not split hairs.

What? That's exactly my point. I know you know what a null is, and the general semantics we're discussing. I was rebuking the unnecessary hair-splitting in @Keyacom's point, which was to his argument's detriment as he was 'correcting' the one and only active person in the discussion who somewhat agreed with him. There's nothing for me to ease off about, IMO - pedantry is the enemy of discourse.

In any case, I agree with this sentiment:

Wouldn't you love to point to a single all-encompassing document and say, "Here's why we don't do nulls, or why we haven't done them"? At this point, I sure would.

We also need something for variables/macros/'references', IMO, since that seems to come up a few times a year - onstantly reiterating "yeah, great, but TOML isn't a programming language" is somewhat tiresome 😅. No doubt there's a few other 'low-hanging fruit' ideas that would be deserving of this treatment, too.

arp242 commented 1 year ago

Go has nil

Go has nil only for pointers; it's the same as C in this regard (for better or worse).

This is my main concern: it's not supported by all mainstream languages or requires quite a bit of "special" code in the application, and the exact semantics of "null" differ. Numbers, strings, arrays, and tables behave essentially the same in almost all languages; there are differences of course but by and large it's the same.

I also rather like that you can rely on a value always being a certain type, even in e.g. Python you won't have to worry about None values and write checks for them.

That doesn't mean null values can't be added, but there is a cost to it. Is it worth that? I don't know; maybe?

Another advantage of adding null would be that all JSON documents can be converted to TOML, and almost all YAML documents (only the !! nodes would be missing I believe), which would be fairly useful.

darkuranium commented 3 months ago

I also rather like that you can rely on a value always being a certain type, even in e.g. Python you won't have to worry about None values and write checks for them.

I know this is a bit of a necro, but this is an argument I've seen brought up quite a lot of times, and it irks me: it's only valid in cases where all types suddenly get null as a valid value.

What makes more sense, in my opinion, would be for null to have to be explicitly permitted in a field, in which case it becomes no different to getting a string where you're expecting a table. And no different from having to use a differently-typed sentinel (e.g. "never" on an otherwise numeric field), except it's more future-proof. In other words, make null its own type, only valid in fields that are explicitly marked as being able to have the type (in a schema-validated config, of course).

Alternatively, some sort of a generic keyword syntax could be invented (not unlike Ruby's symbols [:foobar] or Erlang's atoms [foobar]). This would enable each project to come up with their own null — or even multiple symbols, depending on situation — to take above example, think max_upload_size = unlimited and max_upload_size = uploads_forbidden, though I agree that inf & 0 work well in that particular case. That said, this is probably an overkill for TOML. Food for thought, though.


A bit of a tangent, but on the topic of "simply omit the key": Issues like overriding/docker-compose aside, I feel like there are very good reasons to have explicit values in configuration sometimes. "Explicit is better than implicit" and all that; doubly so with security-related things, where you might want to enforce that the user provides a value, to make sure that they explicitly decided on it (them, or I suppose their OS's package maintainer) — even if none / null is a valid choice. Sometimes, a value can sensible be present or null, but without it sensibly defaulting to null to simply allow omission.

Where this is true of $SOME_PROJECT's TOML files depends on said project, but I'd argue the project using TOML knows better on whether explicit or implicit is more appropriate in their own application of TOML than the format itself.

Artemis21 commented 3 months ago

What makes more sense, in my opinion, would be for null to have to be explicitly permitted in a field, in which case it becomes no different to getting a string where you're expecting a table.

Well said. This was the main point this issue was meant to add the discussion, though perhaps I did not express it clearly enough.

arp242 commented 3 months ago

null to have to be explicitly permitted in a field

You can't really do that in Python because it all just maps to an object; there is no straight-forward way to say "nulls are allowed/disallowed here", like you do with, say, SQL, or a struct type e.g. Rust.

So what will happen in most Python applications is that you'll get a TypeError. Or you have to check/validate for None.

darkuranium commented 3 months ago

You can't really do that in Python because it all just maps to an object; there is no straight-forward way to say "nulls are allowed/disallowed here", like you do with, say, SQL, or a struct type e.g. Rust.

So what will happen in most Python applications is that you'll get a TypeError. Or you have to check/validate for None.

How is this different from getting a TypeError because you got a str or dict instead of int?

arp242 commented 3 months ago

It's not, but no one is going to write max_upload_size = {} or max_upload_size = "shitloads". Well, maybe someone would, but that would be so silly that I don't think it's our concern.

People would expect to be able to write max_upload_size = Null. This wouldn't be such a huge issue if you can give decent errors, but in most of the dynamic languages (Python, Ruby, Perl, PHP, etc) you can't really, unless you specifically program it all with some custom logic.

So what you end up with is a TypeError when the application tries to use the variable, and in many cases it's not even clear that this is because None is not supported for that value.

darkuranium commented 3 months ago

People would expect to be able to write max_upload_size = Null. This wouldn't be such a huge issue if you can give decent errors, but in most of the dynamic languages (Python, Ruby, Perl, PHP, etc) you can't really, unless you specifically program it all with some custom logic.

I could definitely see people writing max_upload_size = "unlimited" or such, mixing up strings vs arrays of strings, etc.

Also, you don't really see this problem of people randomly expecting to be able to write null in other configuration languages (YAML, JSON/JSONC, etc); unless you're arguing that the average TOML user is dumber than the average YAML or JSON user, which would be a rather amusing argument. Speaking from experience, it seems like a purely hypothetical problem in most cases. And the few where it's less-than-hypothetical, it's no different from writing "unlimited" instead of null.

arp242 commented 3 months ago

I've run in to this quite a few times; both as "random TypeError in Python" and as "ugh, I need to make this a pointer" in Go.

That said, many (perhaps even most) of those were not so much "I wrote max_upload_size=None and this isn't supported", but rather parsing some file and not realizing that some value can be null, and running in to errors because of that. Slightly different scenario.

npip99 commented 2 months ago

The argument that's going back and forth on why null shouldn't be in toml; but that same argument also goes for "Why have none/nil/null in X programming language?". You can always just use sentinel values of some primitive type in that language.

C++ used sentinel values for ~40 yrs before adding std::optional, but std::optional was a good move. Yes, we can use sentinels, but None is a sentinel value. The value proposition of None as a universal sentinel is,

Given that these were the main three reasons for adding explicit None for each programming language, I'm not sure on the flip side why it doesn't apply to TOML. In other words, I am unable to understand the argument

without offering compelling use cases for adding those things"

"We should use None instead of -1" was, in and of itself, the compelling use case that caused std::optional to be implemented in C++17, and why most other languages started off with nil/null/none from inception. Clearly it's not strongly necessary, C++ did not have optional from 1979-2017, and yet C++ is a wildly popular language. But, if it was a compelling use case in virtually every programming language, the question is why are those three bullet points not applicable here? All three seem to apply in the exact same way.


The only thing I can think of that a NULL would be truly useful for is to emulate JSON's null, and even then, what's the point of serializing an explicitly acknowledged absence of data when TOML is not a serialization language?

Yes, TOML is a configuration format, not a serialization format. And my pydantic BaseModels for configuration involve 10x more optionals than my pydantic BaseModels for serializing data. Data rarely could be None. Configuration almost always has keys where it makes sense for them to be None.

In my opinion, toml is a configuration language, not a serialization language, and configuration needs an explicit set for None that is distinct from the default value, more often than serialization does.


In either case, yes sentinel values work, they will always work. We won't be able to provide an example where you can't just use "I AM NOT A VALUE" as a string to represent None, followed by code which finds that string and replaces it with the proper None type of that language. But, it's messier, and less intuitive to read, and more prone to mistake. And just makes it so much harder to read a toml file when a bunch of configuration options are set to a bunch of random magic values that happen to be the sentinel value for None, followed by code which converts all of those magic values to None in your codebase.

Magic sentinels create unlucky people

In either case, TOML is great, it's easier to use than YAML. But, null is good. I have code that handles conversion to null in my codebase that I'd like to remove if TOML ever supports null in the future. I imagine many users have code that converts random sentinels to null immediately after parsing, especially if you're in Swift/Rust where algebraic datatypes are central to the language.