toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.3k stars 845 forks source link

range type #689

Closed alan-isaac closed 2 years ago

alan-isaac commented 4 years ago

I'm new to TOML and really liking it. The one thing I'd really find helpful is a range type, which implementations could interpret either as a range object (e.g., Python) or as an explicit array, depending on the language. I anticipate "just use a 3-array" or "just provide start, stop, and step attributes" as responses, but if you search you'll find that YAML and JSON users also request ranges from time to time. So I think there is a desirable feature here. I'm not going to suggest syntax but array syntax without commas [0 10 1] or doubled periods 0..10..1 or even Mathematica span style 0;;10;;1 pop into mind.

eksortso commented 4 years ago

Can you give an example where putting a hypothetical range value into a TOML document would make more sense than just defining parameters of a range with the value types already defined in the spec?

alan-isaac commented 4 years ago

The proper point of reference is the question, why do so many languages (from Python to Ruby to Mathematica to Matlab) provide simple syntax for range construction? The answer is that it is convenient and expressive.

As an example of usefulness, consider a simulation model where a TOML file is used to represent a collection simulation experiments. Each experiment is a table, and often a key-value pair in the table will specify a parameter and a range of values. This will be far easier to read in the TOML file if there is a simple syntax for ranges. Additionally, it provides direct guidance (e.g., to a Python parser) to construct the range rather than to construct some object that merely represents the parameters of a range.

abelbraaksma commented 4 years ago

I think ranges can also be great as a shortcut for typical common arrays, they save typing, and add clarity to the intention, and remove typos for cases where you create the range by hand.

In scenarios where TOML is used for configuring unit tests, or performance tests, I certainly see the benefit.

Also, they might open the door for infinite sequences, if we were to consider syntax that allows unbounded ranges. However, this would pose a potentially heavy burden on implementers, as such a thing is only possible with lazy evaluation of said range.

ChristianSi commented 4 years ago

Why not use an inline table? E.g. for the simulation model sample:

parameters = [
  { name="alpha", first=2, last=10, step=2 },
  { name="beta", first=1, last=100 },  # default step: 1
  { name="gamma", first=50, last=-50, step=-1 }
]

Unbounded ranges are not a problem either:

range = { first=15, step=3 }

Or, if desired, you might specify the number of repetitions (different values) instead of an upper/last value:

range = { first=4, repetitions=16, step=4 }  # run tests for 4, 8, ..., 60, 64

This use case is too specialized and rare to deserve new syntax (remember what the "M" stands for?), but TOML can easily accomplish it already.

alan-isaac commented 4 years ago

Hi Christian. Your proposed solution was anticipated in my original post (above). It is not parsed to produce a range of values, which is desirable. Instead, it is parsed to produce an object that can be converted to a range of values. A key feature of the TOML spec is its insistence on useful type inference (despite the "M"). And please remember the "O".

The need for ranges is neither specialized nor rare, even if you not need them often. That is why they have been requested in other settings (e.g., YAML, JSON), and that is why they are implemented in MANY programming languages. (I listed some examples above.)

abelbraaksma commented 4 years ago

Basically every for.. in... loop, many while loops, for i=x to y loops etc are inherently ranges. So I wouldn't call it 'rare'. In addition, languages like Java, C#, F#, PHP, Perl, Python, and even XPath all have specific syntax for ranges for arrays, linked lists and/or sequences, splicing and steps in ranges. Just to say that these things wouldn't be so abundant if it was 'rare'. ;).

I don't think the point is that it is currently impossible. The point is to have a simple, clear, unambiguous way of expressing ranges that is portable. As hoc syntax never is. I personally prefer the .. syntax, as it is clear to the casual reader, even without a programmer's background.

pradyunsg commented 4 years ago

I don't think we need this -- the provided functionality is not compelling enough, to justify the complexity this brings in the syntax + mental model. "YAML has it" is very much not a good reason to add syntax to TOML.

Can someone please point out a real world use case where this is a problem? The premise of this issue seems very hypothetical.

alan-isaac commented 4 years ago
  1. I think that a burden falls on those who say things like "the provided functionality is not compelling enough" no to rely on personal habits but to consider why so many languages have found it compelling to provide a special syntax for ranges. (Related: see Abel's comments.)
  2. I've looked around a bit and think that haskell's notation is simple and obvious (i.e., easy to understand). As Abel emphasizes, obviousness (the "O" in TOML) is a compelling consideration here. In haskell notation, the user provides the first two terms of the sequence and an upper limit. So a sequence from 1 to 9 by 2s becomes [1,3..9]. I think this looks good for TOML because it resembles array syntax but nevertheless parses without ambiguity.
  3. As for real-world use cases, these arise whenever value ranges are needed. Abel mentioned unit testing. My example of simulation modeling is not at all hypothetical: TOML is now in use for the specification of simulation models. And again, range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian.
lmna commented 4 years ago

many languages have found it compelling to provide a special syntax for ranges

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

range notation is much more obvious to a human reader than an actual list of sequence terms or the kinds of indirect workarounds described by Christian

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

abelbraaksma commented 4 years ago

range = { first=4, repetitions=16, step=4 }

Yes, it's directly understandable for a reader of the configuration. Much less obvious how to type it, or what values are valid:

And herewith lies the problem: each and every application that supports TOML and needs a range, has to fully specify how it deals with all of these situations.

Just like with other features that are not necessarily used by everyone (nested arrays, I can't get the support engineers to understand them, but that's also true for the json-like syntax: TOML is certainly not for the average user), it is better to specify once and be clear about it, than let each and every configuration define it for themselves.

Even if only 10% is going to use it, it even if it's only useful in a subset of situations, this is true for most features of TOML, rarely will you see config files that use everything. Imo, that shouldn't be the leading argument.

Likewise, I can understand the hesitancy, in that you don't just want to extend the syntax on everyone's whim. Personally, I don't think this is a whim, and had wide spread usage in both present and past languages and configuration files. Let's do it right, and help users and designers with a clear addition to the syntax, ready if they need it, ignorable if they don't.

PS: for implementors, I think this is a very trivial thing to add.

alan-isaac commented 4 years ago

Programming, transformation and query languages are mostly irrelevant to TOML`s primary objective: to be a minimal configuration file format.

This observation is orthogonal to the point. The point is simplicity and expressiveness.

range = { first=4, repetitions=16, step=4 } <-- This one is instantly understandable because it is explicit (kinda self-documented).

[1,3..9] <-- This one is cryptic because average human is not used to this exact notation.

This claim is incorrect. Only a programmer would say such a thing, and even then only a programmer who assumes additional context (i.e., this conversation). Arithmetic sequences using dots are introduced in grade school. The notation is notation exactly the same, but it is close. This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

I won't say more because Abel has said it much better than I could.

lmna commented 4 years ago

Arithmetic sequences using dots are introduced in grade school.

Numeric sequences are introduced in school, the notation is like (a1, a2, ..., aN, ...), and the semantics does not by any means imply arithmetic progression. For instance, (1, 3, 9) can describe first terms of geometric progression, or just some arbitrary sequence. Semantics of well-known school notation is pretty far from what you suggest.

This comment also misses a key point: the range syntax should be parsed to produce a range object or an explicit array. That is not what happens with the alternative.

This is not a point at all. Configuration files should be handy for those who read and write them by hand. Shiny parser API cannot be an excuse for increase of amount of syntax features that user must learn.

ChristianSi commented 4 years ago

Like @lmna said earlier: programming languages have tons of stuff which TOML neither has nor needs, since it's not a programming language. More relevant to the issue at hand would be whether other commonly used data serialization or configuration file formats have a built-in syntax for range types. As far as I can tell, that's not the case. Not even YAML (whose M could well mean "Maximal") seems to support it.

I don't doubt that this feature has been "requested" from time to time, but the fact that these requests have apparently all been rejected should tell us something.

As for obviousness: In Ruby, 1..10 creates an inclusive range (from 1 to 10), while 1...10 creates an exclusive range (actually from 1 to 9). That's obvious? Really?

abelbraaksma commented 4 years ago

cannot be an excuse for increase of amount of syntax features that user must learn.

I agree, so instead of requiring users to learn the individual specifications of each and every usage of TOML, let's give both readers and writers something they can work with and that's easy to understand and easy to write. Learn once, apply everywhere.

That's obvious? Really?

Not at all, it's good to learn from other's mistakes, and precisely the reason why we should keep it simple and explicit. One syntax, with an obvious meaning.

alan-isaac commented 4 years ago

Configuration files should be handy for those who read and write them by hand.

Yes. That is precisely the point.

I am very confident that if the syntax [first,next..max], nobody will ever complain that is is hard to read or write. I am also very confident that not a single person will ever complain about writing or reading [1,2..20] instead of [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] -- during the typing of which I head to correct two errors and then double check that there were no others.

lmna commented 4 years ago

One syntax, with an obvious meaning.

Meaning of [1,3..9]syntax (well, if you manage to guess that the whole construct is about arithmetic progression) is not really obvious because of the following questions:

Is it worth it to describe it all in the TOML spec? Will users be truly happy and enthusiastic about reading and remembering all that stuff? Will it be obvious for those who dont bother to even read the spec?

alan-isaac commented 4 years ago

Will it be obvious for those who dont bother to even read the spec?

Certainly more obvious than local time, arrays of tables, or dot notation for supertable generation.

lmna commented 4 years ago

Learn once, apply everywhere.

I do disagree with "learn" part.

In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML.

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

alan-isaac commented 4 years ago

In the real world, TOML has some obscure syntax features (arrays of tables, first of all). Despite of that, we should do our best to not screw things even further.

TOML is not yet at 1.0. Will you propose to remove arrays of tables before the 1.0 release? Why or why not? How about local time notation? Keep or discard? And why?

alan-isaac commented 4 years ago

how do i use this feature to configure a print job for pages 3, 7, 12-15, 21-24

This is a great question. Here is a possible notation for that: [3, 7, 12-15, 21-24]. What if you want only every other page in the first range? Then [3, 7, 12-15 by 2, 21-24]. What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

lmna commented 4 years ago

TOML is not yet at 1.0. Will you propose to remove

Official goal for version 1.0.0 is to be backwards compatible (as much as humanly possible) with version 0.5.0. So removal of existing syntax is not an option any more.

Will you propose to remove arrays of tables before the 1.0 release?

This could be done for 2.0, if someone comes up with an exellent alternative to current arrays-of-tables.

How about local time notation?

The whole date&time thing, not only the "local" aspect, was a very controversial feature. I believe that first-class date&time is not worth its complexity.

alan-isaac commented 4 years ago

if someone comes up with an exellent alternative to current arrays-of-tables

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use. Yes, this is always the correct criteria. (Just fyi, I am pleased to have date-time functionality, although I wish times required a clarifying T prefix.)

lmna commented 4 years ago

If I may paraphrase, in the absence of such an alternative, in your judgment the cost in readability is repaid by the ease of use.

Complexity of arrays-of-tables is justified by expressive power. An alternative should reduce the complexity (make things more obvious & trivial), but not at cost of readability and expressiveness.

Important thing to note is that first-class date&time and first-class ranges do not add anything to readability and expressiveness. You can encode them as TOML strings and then interprete those strings at application level (just like you interprete any other configuration parameter). No sacrifices here.

alan-isaac commented 4 years ago

first-class ranges do not add anything to readability and expressiveness

This claim is obviously incorrect. Prove it to yourself by typing out any long range without ever checking to see if you made an error. A good syntax for ranges add readability, expressiveness, and ease of use. (Which is exactly why this exists in so many programming languages.)

Of course if I just want to parse everything myself, I could use an INI parser and handle the string values. A key piece of the value added by TOML is elimination of this need in config files.

lmna commented 4 years ago

Prove it to yourself by typing out any long range without ever checking to see if you made an error.

Okay, lets do it once again. range = { first=4, repetitions=16, step=4 } Hope, it is long enough?

alan-isaac commented 4 years ago

lets do it once again

  1. You can only easily interpret the meaning of this because you are in this conversation. So, it lacks clear meaning to a reader. (This is a really important point that you are skipping over repeatedly.) This is especially true when readers need not be programmers.
  2. It is not standardized. You simply made up the keys to help you know what on earth you are talking about, which even so would not be evident if you were not in this conversation.
  3. It is parsed to an object that must be converted to an array by a knowledgeable user. So it has reduced functionality.

So in fact the meaning is not obvious at all to a reader who is not in this conversation. You are simply making the point that there are available workarounds, although without any supporting standard. Yes, we all know that. That's what we're doing now. The request is for something less tiresome and more communicative.

ChristianSi commented 4 years ago

Just a quick reminder: it is NOT the case that the party with the highest number of comments wins :wink:

alan-isaac commented 4 years ago

What about the example you are discussing? It becomes [1-9 by 2]. I would have no problem with such proposals.

Even more explicit would be [1-9 by +2].

abelbraaksma commented 4 years ago

In the ideal world, you should learn a lot about a program that you are writing configuration for, but the syntax of configuration file should require no learning at all. I see this as an ultimate goal for evolution of TOML.

@Imna It's a great goal. But I've so far spent many hours on learning TOML and understanding the peculiarities of the syntax. I'm still not there, it's a rather complex spec with many caveats. And I have 25+ years experience in various computing and programming fields and have been a co-editor of W3C specifications. I know how to read specs (at least I like to think so ;) ), but TOML, in all its conciseness, is not so KISS anymore.

TOML is also way too complex for support-engineers at my partnering hosting company to write correctly. I just send them an updated file instead of saying: "please update field so and so in the TOML config", as they always make mistakes. But this is also true for any other config language. I think that the target audience is programmers and software engineers, even though we'd like it to be different.

That is not a critique, other config syntaxes are often harder to learn and compared to them I really like TOML and the way it tries to find a balance. But without JSON background and an understanding of arrays, tables etc, you are up for a rather steep learning curve.

Should you then stop adding new features? Stop evolving the syntax to prevent it getting more complex? I'm not sure of the right answer here, but generally I think evolution is good. To a certain degree, obviously.


I agree that writing range = { first=4, repetitions=16, step=4 } is clear, but it doesn't remove the fact that it's meaning is implementation-dependent. Unless you suggest that the above syntax is to be translated into [4, 8, 12...etc] by implementers, and not into an object with three fields.

We should, however, first try to answer the question: do we want this in? If the answer is yes, we can come up with an understandable and sufficiently-easy syntax. If not, we don't need to attempt that anyway.

marzer commented 4 years ago

I like the idea, but only if the chosen syntax is obvious for non-programmers. There's plenty of good examples of range constructs in programming languages but they're obviously only succinct and clear to people who know those languages. Requiring some familiarity with how language X does feature Y in order to make a config change defies the spirit of what TOML is intended to be, methinks.

If you really wanted it to be simple, obvious, and unambiguous you could introduce some keywords, e.g. my_range = from 5 to 10 inclusive. Pretty hard to misunderstand what that means but obviously complicates parsing a bit.

abelbraaksma commented 4 years ago

@marzer, I like your idea, it's clear, simple and concise. The (slight) extra burden on parsing shouldn't be too hard to tackle.

alan-isaac commented 4 years ago

my_range = from 5 to 10 inclusive

First of all, I have no strong preference on syntax and any such approach would fully meet my needs. But I still have a few comments.

  1. Only programmers are going to be worrying about whether a range is inclusive or exclusive. People ordinarily use inclusive ranges.

  2. As Abel has pointed out, it is very easy to overstate the syntax burden of any of the proposals. If someone is writing TOML, they'll learn an easy syntax after using it once. If someone is reading a TOML file, other things will be much harder for them to guess than the meaning of say, [5,10..100] or [5,10,15,...,100] or [5-100 by +5]. They'll look them up once and be done. Or, the writer can add a comment. All of these syntaxes allow truly trivial mastery.

  3. My request is only (!) for a range, but Alexey's query about printer configuration raises the possibility of an encompassing syntax. Consider the meaning of [3, 7, 12-15 by +2, 21-24]. Will anyone argue this is will not be obvious to a non-programmer? It is currently my favorite among the proposals: simple and obvious, and apparently easy to parse.

abelbraaksma commented 4 years ago

Will anyone argue this is will not be obvious to a non-programmer?

Since that matches the syntax used by Microsoft for decades in their "Print" dialog box to select pages to print from a document (apart from the brackets and by, the latter I can live without), I reckon that proves the point that 'ordinary people will understand it': anyone can print a document, or a selection from it.

alan-isaac commented 4 years ago

I reckon that proves the point that 'ordinary people will understand it'

OK, then there is at least one "obvious" syntax.

In addition, two prominent use cases have been defined: printer configuration, and simulation configuration.

I will only add, because a few participants seem not to understand this, that the need to share simulation configurations across platforms and languages is widespread. Having a language agnostic way to do this is highly desirable. Absence of a range syntax in TOML is a barrier, since it requires sharing not just the TOML file but in addition communicating a convention for representing ranges, which means that a transformation will have to be implemented by the recipient.

ChristianSi commented 4 years ago

@alan-isaac:

I will only add, because a few participants seem not to understand this, that the need to share simulation configurations across platforms and languages is widespread.

Is this only an imaginary use case, or are you really using TOML for this purpose? If the latter, it would be useful if you could give a short sample excerpt, showing (a) how you are currently listing this data (without a built-in range syntax) and (b) how you would wish if looked if your preferred range syntax were adopted.

If the difference between the two syntaxes is indeed significant, this might considerably strengthen your case. If not, I have serious doubts that your proposal will make it into the TOML spec.

alan-isaac commented 4 years ago

@ChristianSi Yes, I am using TOML for the configuration of simulations and for the exchange of these configurations. But even if I were not, it is obvious that a configuration language is needed for this purpose, and it should be obvious that simulations that need configuration are all over the place. This is a role that TOML could fill much more nicely than it does.

The workarounds I've tried are all ugly, so there is no real need to discuss them. I have typed longish ranges by hand or produced them at a console and pasted them in. A language-dependent workaround is to provide the range as a code string that is evaluated to get the range. (Insecure!) I've done this. If the relationship at the other end supports it, the range can be described by a table that is processed by the recipient to produce a range. So a variety of workarounds are possible, but they are all awful. Parameter configurations should be sharable without requiring post-processing to extract the actual parameters.

I think that last observation is the one you are repeatedly skipping over, although both Abel and I have emphasized it. After all, if the question becomes whether there isn't some kind of post-processing would make the work possible, we can just go back to INI plus clever hacks.

As I said before, I really don't care which range syntax TOML adopts. However, there seems to be agreement that the syntax that resembles printer configuration is "obvious", so that may be the way to go, especially since it could indeed be used to configure print jobs. It would meet my needs.

I have trouble understanding what your objection is once a useful and obvious syntax (that would not be hard to parse) has been discovered. You seem to suspect that it will not actually find much use; is that it? If so, I strongly disagree.

ChristianSi commented 4 years ago

@alan-isaac I don't think you're strengthening your case by refusing to even show a reasonable example. Well, your choice.

alan-isaac commented 4 years ago

@ChristianSi I'm confused. Aside from machine generated files, there are only workarounds. What about my description of the workarounds is unclear? One cannot illustrate a range syntax in TOML when TOML does not support it. There are only workarounds, none of them universal, and that is exactly the problem. If a good workaround existed, I would not be making a feature request.

I suspect I don't understand what you are after. Perhaps you will be interested in the GUI interface on page 3 of this document, showing the ParameterSweep window in Repast Simphony. This example of course is specific to one particular popular simulation toolkit, but illustrates the kinds of simulation configurations that need to be shared in a language-agnostic, cross platform fashion. Similar interfaces are common in many simulation toolkits; I can share more such examples if you need.

If I understand, you are not contesting that an obvious syntax has been found. You are rather dubious that, if introduced, it would find much use. Is that correct? If so, I urge you to do a Google search on "parameter sweep". You will get millions and millions of hits.

marzer commented 4 years ago

He just asked you to give an example snippet of how you currently express ranges and how you would prefer to do so.

Example:

"Currently we write sim = { begin = 1, end = 500, step = 10 }, but I'd like if we could write sim = 1-500;10."

...but with the snippets pasted from your actual use cases instead of being invented by me for the sake of an example.

alan-isaac commented 4 years ago

@marzer So, I really was not kidding in my description, sad as that may be.

A TOML file contains a collection of experiments; call the parsed result xpmts. A single experiment is a TOML table, where parameter names are the keys. Say the an experiment is in the table [xpmt01]. When it is just a matter of sharing within a local project and all users are Python users, we can eval values that are strings. (Shudder. But we do it.) Thus in the [xpmt01] table the entry param01="range(0,1001,10)" is post-processed by casting when necessary: if isinstance(xpmt01['param01'], str): xpmt01['param01'] = eval(xpmt01['param01']). If no experiment parameters are strings (not always the case), we can just walk through the experiment dict, replacing each string value in this fashion.

This approach has too many drawbacks to list, but prominent among them is that the TOML file does not actually specify the configuration but rather provides enough information that an informed enough user can produce the actual configuration by post processing. It would be much better in the [xpmt01] table to be able to write say param01=[0-1000 by +10]. (The actual syntax is not what is important here, but rather the ability to produce the actual configuration rather than a proxy for the configuration.) The safer and more language agnostic approach param01={start=0, stop=1000,step=10) does not fix this. It still means the TOML file cannot simply be shared as a way to share the configuration of the experiment: condition casting of values by the recipient of the configuration file is still required to produce the actual configuration.

Am I responding to the question now?

marzer commented 4 years ago

Am I responding to the question now?

Frankly? No. Just suggest a syntax that would work for you, instead of pontificating and complaining about what doesn't/can't.

I was trying to help you - I think a range syntax would be useful - but... ugh. Good luck, I guess.

alan-isaac commented 4 years ago

@marzer I'm again confused; you asked for an actual example of current usage, which I provided. I also included a syntax that would work for me. It is the same one discussed multiple times above. In response to your question, I mentioned the syntax [0-1000 by +10] because that (or some variant appeared to have some support, particularly since it is tied to printer configuration syntax. My own preference is [0,10..1000], taken straight from Haskell, which I also mentioned above, but there were some objections to that (i.e., claims it was not "obvious" enough). Nobody has claimed the printer configuration syntax is not "obvious", and nobody has claimed it would be hard to parse.

Just to be clear, the syntax you suggested (0-1000;10) would also work just fine for me. But I anticipate objections that it is not obvious enough. Also, if I understood correctly, Abel proposed [0-1000 +10]. This would also be just fine. So would Scala syntax: (0 to 1000 by 10).

Whatever the team decides is most suitable will be perfectly fine with me. I care about the functionality much more than the syntax.

alan-isaac commented 4 years ago

@marzer So which of the syntaxes that I've just mentioned would you choose?

ChristianSi commented 4 years ago

@alan-isaac My impression is that you're not just hoping for a range type in TOML – which would conceptually, regardless of the syntax chosen, encode a triple of the form: range(start at x, stop at y, proceed in steps of z) – but you're also expecting TOML to evaluate the range for you. So instead of, say, range(start at 1, stop at 10, proceed in steps of 3) you're hoping to get the array [1, 4, 7, 10]. Is that correct?

abelbraaksma commented 4 years ago

@ChristianSi, I've always thought that was the main aim of this thread. Otherwise, it's essentially the same as using a json style object (apart from the advantage of an non ambiguous syntax).

There have been questions of 'how do you do it now' and how it would change. The answers in this same thread coming down to: you can't do it now, so there's no example.

Well, here's how I do it currently.

Obviously, there are other ways of achieving the same effect, but at the time, this seemed simplest. I looked at some existing parsers to amend them for this purpose, until I stumbled upon this thread.

So I waited, in case an a agreement could be reached.

I guess implementations could choose to statically expand into an array, or could choose to give an enumerator, or both, depending on their interface. But that's true already for the existing syntax of arrays, though an enumerator may be more applicable in some scenarios. But that's of course an implementation detail, irrelevant for TOML itself.

alan-isaac commented 4 years ago

@ChristianSi

tl;dr: Yes.

The answer by @abelbraaksma nicely captures the core issue. The job of the TOML spec is just to provide an unambiguous meaning to the syntax, not to determine the parser implementation details. (Although, recommendations could be made, course.) For example, for TOML tables, the popular C parser for TOML naturally produces a struct rather than a hash table. The important thing is that I can send a file to a C user or a Python user and just say "use a TOML parser to extract the configuration of this experiment".

The goal is simply to have an obvious syntax that unambiguously indicates that a range of values is produced by a TOML parser, not to constrain how a particular parser might produce that (e.g., as a list, a tuple, an array, or a range object). In fact, Abel's examples have persuaded me (against my original thought) that the type of syntax he describes would be most useful to others (even though I just (!) need ranges). The printer configuration example is what really persuaded me. To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

ChristianSi commented 4 years ago

@alan-isaac:

To meet that need, something like one of the following syntaxes seems most obvious: the printer influenced [1,3, 10-20, 50-100 +2] or the Scala influenced [1, 3, 10 to 20, 50 to 100 by 2]. In each case a list (or other sequence datatype) would be expected to result from parsing.

I see, but let's be honest: that will never happen, since, as pointed out much earlier in this tread, TOML is not a programming language. A TOML parser will parse date strings into date objects and number strings into numbers, but it will never evaluate stuff like "10 days after 2019-12-23" (regardless of the syntax used). I even doubt that stuff like num = 3.2*10^20 + 17 will ever be evaluated by a TOML parser. TOML will never have readable and writeable variables, for loops, or conditionals -- and what you're asking for is essentially of the same scope. It's a programming language construct, and those are outside of TOML's feature set.

On the plus side, you might be able to solve your problem be sending the file through a template engine before parsing it as TOML.

alan-isaac commented 4 years ago

@ChristianSi Are you offering a false dichotomy? I doubt that you can come up with any coherent way of distinguishing parsing 1979-05-27T07:32:00-08:00 to a data-time object and parsing [0-100] to a range. Please suggest how to understand the distinction as you are trying to draw it. Thanks.

ChristianSi commented 4 years ago

@alan-isaac I'll easily parse [0-100] (or, as I would certainly prefer to avoid confusion with the subtraction operation, [0..100]) into "list containing one value: range(from=0, to=100)" for you. But that was decidedly not what you wanted in your preceding comment.

alan-isaac commented 4 years ago

@ChristianSi As I also said in my previous comment, I personally just need ranges. If you are saying that you would be happy to have a range synatx 0..100 that say the Python or Ruby parser would parse to a range object, then yes please! That would be extremely helpful!

The rest of the discussion appears separate to me. In that separate discussion, I still believe you are drawing an untenable line between parsing and transforming, as illustrated most nicely by the date-time type. TOML has lots of syntax that provides convenient ways to say what values should be produced by a TOML parser. Indeed, this is a key feature of TOML over INI (where standard parsers produce only strings as keys and values). So I still invite you to try to make concrete the reasoning for rejecting say the printer-configuration syntax, with date-time parsing being the point of reference for the purposes of the discussion.

But again, I would be be delighted by the addition of a simple range syntax, and the Haskell influenced double-dot notation would be great.