Closed Artemis21 closed 1 year ago
First of all, you have done plenty of good research into the realm of null values and why we have rejected them for TOML in the past. You're coming into this from a good place, and I admire that.
That said, I am definitely biased against them. NULL
s are not welcome here.
So I too have been tempted to suggest an explicit null value type in TOML consisting of a singleton NULL
(in all caps, to highlight the terrible mistake that we'd be making), which would be distinct from all other values and equivalent to None
in Python or the NULL pointer in C. But as you have read elsewhere, allowing for the instantiation of NULL
brings more trouble than it's worth. (And null values bring even more trouble in SQL, because absolutely nothing can = NULL
in a good SQL database, not even NULL
.)
The only thing I can think of that a NULL
would be truly useful for is to emulate JSON's null
, and even then, what's the point of serializing an explicitly acknowledged absence of data when TOML is not a serialization language? Maybe an explicit return_blanks = true
or override = true
would be more sensible for a human-readable configuration. Or maybe you can make it simpler and more obvious with the use of NULL
in well-documented places.
So to save time, I implore you to impress us further, to make this effort worth our while. Provide us with a detailed use case or two that cannot be handled more simply than with an explicit and distinct null value type. You are raising this topic anew, so if null values in TOML would really make the concepts that you deal with a whole lot simpler, then show us what you're dealing with, and how it can be done better. Maybe putting NULL
in TOML would be worth all its troubles.
But expect some pushback. No bad idea gets adopted without a fight.
For example, say I am configuring a web server. There is an option called max_upload_size
, with some sensible, safe default. But I know that my web server will only be accessible to trusted clients, or I have a reverse proxy in front of it and the upload limit is set there. So, I want to disable the upload limit in the web server. To me, max_upload_size = none
is the most obvious way of writing this.
max_upload_size = 0
would be equally sensible.
# max_upload_size = disabled by default
max_upload_size = 0
would be equally sensible.
It is always possible to find a sentinel value, I just don't think it's an especially tidy solution. I appreciate that the work involved in updating all TOML parsers may outweigh "not especially tidy".
# max_upload_size = disabled by default
I don't really consider this an option, unlimited upload sizes cannot be the default, things should be secure by default.
max_upload_size = 0
would be equally sensible.It is always possible to find a sentinel value, I just don't think it's an especially tidy solution.
It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1
, would carry the same significance.
Skipping ahead...
# max_upload_size = disabled by default
I don't really consider this an option, unlimited upload sizes cannot be the default, things should be secure by default.
If an option is not set (or is only commented out), then the config's consumer must handle its absence, no matter what. So the server cannot allow unlimited upload sizes by default (and I should not have been so naive to accept unlimited as a default). Nevertheless, a default exists. A large default, but still significant.
But default max values can be exceeded. max_upload_size = 9_223_372_036_854_775_807
means effectively the same thing as disabling the max. Any ridiculously large value would mean the same thing.
So:
NULL
(I refuse to call it "none") would serve a singular purpose: it would inform our human audience that what we are doing is unwise but we're doing it anyway.That last one is the only viable excuse for introducing a null value. And like all non-numerics, it would require special handling, by both the parser and the consumer, should we allow it.
Which brings us back to this:
I appreciate that the work involved in updating all TOML parsers may outweigh "not especially tidy".
This is going to be a consideration that must be made, no matter what. It cannot be ignored. And so far, it still cannot be justified.
It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1, would carry the same significance.
I disagree, the purpose of zero is not obvious - it could mean "disallow all uploads" for example. A ridiculously large value would be less ambiguous, but feels like a clear symptom that something is missing.
NULL
(I refuse to call it "none")
I use "none" to distinguish the concept of "a single value of a separate unit type" from the concept of "a single value considered part of any type" ("null"). I am not attached to the naming.
It is tidy. Zero is a natural sentinel, because a literal zero-sized max makes no sense. A negative integer, such as a -1, would carry the same significance.
I disagree, the purpose of zero is not obvious - it could mean "disallow all uploads" for example. A ridiculously large value would be less ambiguous, but feels like a clear symptom that something is missing.
This is kind of the problem with None/Null values: it only signals one sentinel value and you can often have multiple "obvious" ones (here: "disallow uploads" and "no limit on upload size" both seem obvious). Plus it's not really obvious to me what "none" would mean here either; it's definitely something you'd have to document and the difference between documenting "none" or "-1" is essentially zero.
Since TOML keys aren't typed, it seems to me that using a string value would be best if you want it to be obvious without documentation.
Speaking purely from an implementation point of view: implementing None/Null in Go is kind of a pain since there is no value to represent them (so you need a custom type).
Fortunately, TOML has already a perfect solution for the use case in question:
max_upload_size = inf
No new, error-prone, type required.
inf
is a float not an int so it would be another sentinel value of a separate type. But the matter of whether sentinel values are good enough to make null/none unnecessary is eventually just opinion, and there seem to be plenty of people who believe that it is good enough here.
So... This entire discussion so far is basically a repeat of what has already been discussed in the four issues that have been linked to by OP. I'll note that I genuinely appreciate that OP spent their time and effort researching and reading the past discussions on this topic.
Defining a new unit type, called none, where the only valid value is
none
, would not break any existing guarantees.
I don't see how this is different from what has already been proposed, other than a different spelling.
My position is the same as it was back in https://github.com/toml-lang/toml/issues/803 -- I'm not sure this is a problem that needs solving in TOML.
There are alternatives available in basically all realistic usecases discussed so far (here and in the older threads). As mentioned, having a sane default on the reader of the configuration or setting other non-null values like = "no limit"
or = false
is almost always a better option, and things like round-tripping / representing arbitrary data aren't something TOML should try to be good at.
I'm gonna close this to reflect that it's unlikely that this will change; but please feel welcome to continue the discussion.
I think that the billion-dollar mistake that ALGOL's null reference was should not affect this proposal because it's more about the null pointer, not null value.
The null pointer is not memory-safe, and that's why Rust has the Option<T>
type, which is a memory-safe enum, and provides a None
variant. Most other languages also have a memory-safe null
counterpart: Java, JS, and PHP have null
, Python has None
, Ruby and Lua have nil
, etc.
@Keyacom I can appreciate your interest in all the different concepts that TOML has approached or avoided over the past decade. But over multiple issue threads, you keep saying "let's add this thing" without offering compelling use cases for adding those things. We really don't want to push any syntax, type, concept, or redundancy into all the parsers and other tools that implement the standard if we can't make it obvious why we're doing so.
So permit me to ask, what are you using TOML for? Why won't sentinels work in your particular situation? How could null address the problems that you're facing? Are they new or overlooked problems, or long-standing problems that we have considered before?
Please give us some details! What are we overlooking?
Just my 2¢ - I have come across the desire for this a handful of times when I want to have some sort of "configuration precedence". That is, something like Application Defaults -> RepoConfig.toml -> ProjectConfig.toml
. If I set a value for RepoConfig.toml
then wish to "unset" it in the project configuration, there isn't really a good way to do that.
It is possible to set it to the default value, but that doesn't always make sense when the default value is nothing - that is, the value being unspecified carries some natural meaning that is different from any value being specified. I think it's better to think of it as an extension to the existing type that has a single purpose, rather than being something completely different. That is, a value may be a string, a bool, a number, an object, or it can just be nothing at all.
Sentinels aren't really a good solution in modern programming. C doesn't have a way to represent optional values, so returning -1
is common to represent an error. But Python has None
, SQL has NULL
, Rust has Option<T>
(and a wonderful set of associated functions to work with it), JSON and YAML have null
, JS has both none
and undefined
, C# has Nullable<T>
, Go has nil
, and even C++ now has std::optional
- I think this indicates that there are real use cases for "maybe exists" values, which in cases may carry over to configuration.
Additionally, using magic values isn't good for future proofing (what if -1 is a sentinels, but later on negative numbers become possible values?) and they suffer from being implicit rather than explicit (read the documentation if you want to understand what this means vs. a clear meaning to anyone who looks at the file - which is part of the point of TOML).
Then there's the inconsistency aspect, that multiple keys need to use different ways to represent nothing. From least to most absurd:
only-string = -1
string-or-int = inf
number-or-string = false
bool-or-array-or-number = {}
bool-or-number-or-string-or-array-or-object = 00:00:00
Obviously that's not realistic, but it isn't uncommon for larger schemas to need >1 way to represent the same nothing, and that's just confusing to understand.
(Random, but I personally think the word empty
fits better into the feel of TOML than something like null
or none
)
name = "Orange"
physical.color = empty
physical.shape = "round"
site."google.com" = true
To add a more concrete example - per the docker-compose
schema there are 7 instances where null
is allowed. I believe all these cases are for overriding inherited configuration, which lines up with the use cases I've run into myself.
Sentinels aren't really a good solution in modern programming. C doesn't have a way to represent optional values, so returning
-1
is common to represent an error. But Python hasNone
, SQL hasNULL
, Rust hasOption<T>
(and a wonderful set of associated functions to work with it), JSON and YAML havenull
, JS has bothnone
andundefined
, C# hasNullable<T>
, Go hasnil
, and even C++ now hasstd::optional
- I think this indicates that there are real use cases for "maybe exists" values, which in cases may carry over to configuration.
C has a NULL
pointer, but derefing NULL
of course causes memory issues. JS does not have none
, but null
.
SQL NULL
is a bit different concept because it's a marker, not a pointer or a value, but I see potential use for null in TOML as SQL NULL
in database generation files. I've made a SQL file generator in Python for a bot I'm planning to develop in the future, in case someone else wants to deploy even a local instance. I made one because it's easier to maintain the TOML file, rather than the SQL file (especially with schema modifications), which is hinted by having the SQL file generated, like MediaWiki does from tables.json (in MW, it's also because of need to support multiple RDBMS software, but my bot's DB specifically targets PostgreSQL). I used TOML because I think it's easier to maintain than JSON. The generator that I wrote is expected to either use the given value or NULL
in the default
key for each column spec. NULL
could also be explicitly set as the default in the TOML file used to generate the SQL, but I'm only planning on that because, of course, TOML does not have null yet.
Regardless, we should probably reopen this issue (i.e. lift the Closed marking).
@Keyacom
JS does not have none, but null.
This is unhelpful pedantry and does not at all meaningfully contribute to the discussion.
Regardless, we should probably reopen this issue (i.e. lift the Closed marking).
I disagree. Your situation alone doesn't really offer a compelling argument. You're suggesting that because
the language (and its many implementations) should change, and not that you've made a poor design choice?
If you genuinely think you have a good, novel case to make, then by all means open a new issue, but you should realize that the first one linked in this discussion, #30, is now ten years old. There's likely to be very little in this area that hasn't already been discussed to death.
@Keyacom
JS does not have none, but null.
This is unhelpful pedantry and does not at all meaningfully contribute to the discussion.
Ease up a bit. Let's not split hairs. We're all talking about the same thing: a type distinct from all others, consisting of a singular value whose references are all identical, intended to express the deliberate absence of a more meaningful value. That's NULL
in a nutshell. (Much hairier in SQL, but the point's made.)
I'm still against it, but I know what it is, and how it's used elsewhere.
Regardless, we should probably reopen this issue (i.e. lift the Closed marking).
I disagree. Your situation alone doesn't really offer a compelling argument. You're suggesting that because
- SQL has nulls
- TOML does not
- You've chosen to pair those together
the language (and its many implementations) should change, and not that you've made a poor design choice?
@marzer I agree with you 100% here, specifically on the principles of good configuration design and practice that make settings orthogonal to each other and don't require any sort of hacky "unsetting" of key values. In a perfect world, this would be standard. But nobody's perfect.
@tgross35 You mentioned that Docker (I forget the name of the package) allowed the use of nulls in its JSON-based configurations to clear a value set at an intermediate level. I'm still asking for a detailed use case where a null would need to be used at the user's level. Since you've seen them a lot, you would be an ideal candidate to flesh out the details of such a use case and describe what we may have overlooked. Which may still be a bad design choice, but its legacy would be hard to get away from.
If you genuinely think you have a good, novel case to make, then by all means open a new issue, but you should realize that the first one linked in this discussion, #30, is now ten years old. There's likely to be very little in this area that hasn't already been discussed to death.
@pradyunsg Bringing you back into this, since a new type of document for the project is in order. We ought to put together something like Python PEPs to describe recurring issues and condense the dozens of conversations we've had over the years about them. And null-type proposals are a prime candidate for such a document, containing rationales, alternatives, primary candidate proposals, and ultimately decisions made. Having one would save everybody a lot of time and effort.
Wouldn't you love to point to a single all-encompassing document and say, "Here's why we don't do nulls, or why we haven't done them"? At this point, I sure would.
@eksortso
Ease up a bit. Let's not split hairs.
What? That's exactly my point. I know you know what a null is, and the general semantics we're discussing. I was rebuking the unnecessary hair-splitting in @Keyacom's point, which was to his argument's detriment as he was 'correcting' the one and only active person in the discussion who somewhat agreed with him. There's nothing for me to ease off about, IMO - pedantry is the enemy of discourse.
In any case, I agree with this sentiment:
Wouldn't you love to point to a single all-encompassing document and say, "Here's why we don't do nulls, or why we haven't done them"? At this point, I sure would.
We also need something for variables/macros/'references', IMO, since that seems to come up a few times a year - onstantly reiterating "yeah, great, but TOML isn't a programming language" is somewhat tiresome 😅. No doubt there's a few other 'low-hanging fruit' ideas that would be deserving of this treatment, too.
Go has nil
Go has nil only for pointers; it's the same as C in this regard (for better or worse).
This is my main concern: it's not supported by all mainstream languages or requires quite a bit of "special" code in the application, and the exact semantics of "null" differ. Numbers, strings, arrays, and tables behave essentially the same in almost all languages; there are differences of course but by and large it's the same.
I also rather like that you can rely on a value always being a certain type, even in e.g. Python you won't have to worry about None values and write checks for them.
That doesn't mean null values can't be added, but there is a cost to it. Is it worth that? I don't know; maybe?
Another advantage of adding null would be that all JSON documents can be converted to TOML, and almost all YAML documents (only the !! nodes would be missing I believe), which would be fairly useful.
I also rather like that you can rely on a value always being a certain type, even in e.g. Python you won't have to worry about None values and write checks for them.
I know this is a bit of a necro, but this is an argument I've seen brought up quite a lot of times, and it irks me: it's only valid in cases where all types suddenly get null
as a valid value.
What makes more sense, in my opinion, would be for null
to have to be explicitly permitted in a field, in which case it becomes no different to getting a string where you're expecting a table. And no different from having to use a differently-typed sentinel (e.g. "never"
on an otherwise numeric field), except it's more future-proof.
In other words, make null
its own type, only valid in fields that are explicitly marked as being able to have the type (in a schema-validated config, of course).
Alternatively, some sort of a generic keyword syntax could be invented (not unlike Ruby's symbols [:foobar
] or Erlang's atoms [foobar
]).
This would enable each project to come up with their own null
— or even multiple symbols, depending on situation — to take above example, think max_upload_size = unlimited
and max_upload_size = uploads_forbidden
, though I agree that inf
& 0
work well in that particular case.
That said, this is probably an overkill for TOML. Food for thought, though.
A bit of a tangent, but on the topic of "simply omit the key": Issues like overriding/docker-compose aside, I feel like there are very good reasons to have explicit values in configuration sometimes. "Explicit is better than implicit" and all that; doubly so with security-related things, where you might want to enforce that the user provides a value, to make sure that they explicitly decided on it (them, or I suppose their OS's package maintainer) — even if none
/ null
is a valid choice.
Sometimes, a value can sensible be present or null, but without it sensibly defaulting to null
to simply allow omission.
Where this is true of $SOME_PROJECT's TOML files depends on said project, but I'd argue the project using TOML knows better on whether explicit or implicit is more appropriate in their own application of TOML than the format itself.
What makes more sense, in my opinion, would be for null to have to be explicitly permitted in a field, in which case it becomes no different to getting a string where you're expecting a table.
Well said. This was the main point this issue was meant to add the discussion, though perhaps I did not express it clearly enough.
null to have to be explicitly permitted in a field
You can't really do that in Python because it all just maps to an object; there is no straight-forward way to say "nulls are allowed/disallowed here", like you do with, say, SQL, or a struct type e.g. Rust.
So what will happen in most Python applications is that you'll get a TypeError. Or you have to check/validate for None.
You can't really do that in Python because it all just maps to an object; there is no straight-forward way to say "nulls are allowed/disallowed here", like you do with, say, SQL, or a struct type e.g. Rust.
So what will happen in most Python applications is that you'll get a TypeError. Or you have to check/validate for None.
How is this different from getting a TypeError because you got a str
or dict
instead of int
?
It's not, but no one is going to write max_upload_size = {}
or max_upload_size = "shitloads"
. Well, maybe someone would, but that would be so silly that I don't think it's our concern.
People would expect to be able to write max_upload_size = Null
. This wouldn't be such a huge issue if you can give decent errors, but in most of the dynamic languages (Python, Ruby, Perl, PHP, etc) you can't really, unless you specifically program it all with some custom logic.
So what you end up with is a TypeError when the application tries to use the variable, and in many cases it's not even clear that this is because None is not supported for that value.
People would expect to be able to write max_upload_size = Null. This wouldn't be such a huge issue if you can give decent errors, but in most of the dynamic languages (Python, Ruby, Perl, PHP, etc) you can't really, unless you specifically program it all with some custom logic.
I could definitely see people writing max_upload_size = "unlimited"
or such, mixing up strings vs arrays of strings, etc.
Also, you don't really see this problem of people randomly expecting to be able to write null
in other configuration languages (YAML, JSON/JSONC, etc); unless you're arguing that the average TOML user is dumber than the average YAML or JSON user, which would be a rather amusing argument.
Speaking from experience, it seems like a purely hypothetical problem in most cases. And the few where it's less-than-hypothetical, it's no different from writing "unlimited"
instead of null
.
I've run in to this quite a few times; both as "random TypeError in Python" and as "ugh, I need to make this a pointer" in Go.
That said, many (perhaps even most) of those were not so much "I wrote max_upload_size=None and this isn't supported", but rather parsing some file and not realizing that some value can be null, and running in to errors because of that. Slightly different scenario.
The argument that's going back and forth on why null shouldn't be in toml; but that same argument also goes for "Why have none/nil/null in X programming language?". You can always just use sentinel values of some primitive type in that language.
C++ used sentinel values for ~40 yrs before adding std::optional, but std::optional was a good move. Yes, we can use sentinels, but None is a sentinel value. The value proposition of None as a universal sentinel is,
Given that these were the main three reasons for adding explicit None for each programming language, I'm not sure on the flip side why it doesn't apply to TOML. In other words, I am unable to understand the argument
without offering compelling use cases for adding those things"
"We should use None instead of -1" was, in and of itself, the compelling use case that caused std::optional to be implemented in C++17, and why most other languages started off with nil/null/none from inception. Clearly it's not strongly necessary, C++ did not have optional from 1979-2017, and yet C++ is a wildly popular language. But, if it was a compelling use case in virtually every programming language, the question is why are those three bullet points not applicable here? All three seem to apply in the exact same way.
The only thing I can think of that a NULL would be truly useful for is to emulate JSON's null, and even then, what's the point of serializing an explicitly acknowledged absence of data when TOML is not a serialization language?
Yes, TOML is a configuration format, not a serialization format. And my pydantic BaseModels for configuration involve 10x more optionals than my pydantic BaseModels for serializing data. Data rarely could be None. Configuration almost always has keys where it makes sense for them to be None.
In my opinion, toml is a configuration language, not a serialization language, and configuration needs an explicit set for None that is distinct from the default value, more often than serialization does.
In either case, yes sentinel values work, they will always work. We won't be able to provide an example where you can't just use "I AM NOT A VALUE" as a string to represent None, followed by code which finds that string and replaces it with the proper None type of that language. But, it's messier, and less intuitive to read, and more prone to mistake. And just makes it so much harder to read a toml file when a bunch of configuration options are set to a bunch of random magic values that happen to be the sentinel value for None, followed by code which converts all of those magic values to None in your codebase.
Magic sentinels create unlucky people
In either case, TOML is great, it's easier to use than YAML. But, null is good. I have code that handles conversion to null in my codebase that I'd like to remove if TOML ever supports null in the future. I imagine many users have code that converts random sentinels to null immediately after parsing, especially if you're in Swift/Rust where algebraic datatypes are central to the language.
This is another issue asking for a null/nil/none/empty/nothing type. Summary of prior discussion:
30
Arguments for:
select where x = null or x = 1
" (viax = [null, 1]
) as opposed to "select where x = 1
" (viax = [1]
).Arguments against:
{}
) or booleanfalse
instead.146
Arguments for:
Arguments against:
Other proposals:
802
Little new ground covered.
803
Arguments for:
Arguments against:
I agree that a null value, which would be valid for any type, is a bad idea. It is a bad idea because it would create a new option for every existing type, break existing guarantees and complicate implementations.
However, a new type would not do this. Defining a new unit type, called none, where the only valid value is
none
, would not break any existing guarantees. Implementations would just have to handle one more very simple type.In order to express nullable options, a union/sum type of none and another type would be used. Union types are already possible in TOML since it is not statically typed - in fact, the type of any TOML value is the sum of all types by default.
This solution comes from functional languages, and would be familiar to users of those languages, as well as Rust (functional inspired) and Python (although not statically typed, Python's
None
is a type of its own, and not a value of any other type). The TOML syntax would not be visually much different tonull
as used in other languages.For me the most compelling use cases are inheritance/layered environments (where a child might set to
none
what a parent set to something else), and the ability to provide sensible non-null defaults while still allowing the value to be set null in the config file.Points this does not address:
Using sentinel values
While this works, it is not consistent - a sentinel value needs to be found that is of a different type to whatever the actual type/s of the field are. It also requires additional validation beyond type validation - eg. if
false
is used as a sentinel,true
has to also be handled.Explicitly not setting values
Setting a value to
none
would be different to not setting it at all, so this solution would not help people who want to explicitly show an option in a config file without setting it. The suggested workaround here is to simply show the option commented out. Another solution is the explicit unset notation mentioned above, but that would be a separate proposal.