toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.44k stars 847 forks source link

solution suggest for array with both simple value and complex value #576

Closed LongTengDao closed 5 years ago

LongTengDao commented 5 years ago
  * Clarify that you cannot use array-of-table to append to a static array.

--https://github.com/toml-lang/toml/blame/master/CHANGELOG.md#L24

Now, we can't write an array with both simple value and complex value. If we want, we must write json-like array to do this, and child data can never use toml to write any more.

Maybe we can add this feasure:

[[array=0]]
[[array]]
x="1.x"
y="1.y"
[[array=2]]

Then we can get final object:

{
    "array": [
        0,
        { "x": "1.x", "y": "1.y" },
        1
    ]
}
eksortso commented 5 years ago

As far back as I've been involved, arrays have only ever allowed a single data type. Indeed, according to the spec, the "data types [of an array] may not be mixed". You can have an array of integers, or an array of tables, or an array of any single given data type. But as it stands now, mixed-type arrays are not allowed.

Since it's been requested so many times before (partly because JSON allows mixed types in its arrays), perhaps it is time to add "mixed-type arrays" (call them collections or something) to the specification. It would need to be considered a distinct type from arrays, in order to avoid breaking backwards compatibility.

Arrays use square brackets []. Maybe parentheses () could be used for this new type. Its syntax would be similar to what is used for arrays currently.

# PROPOSAL: a new mixed-type collection, distinct from arrays

collection = (0, {x="1.2", y="1.x"}, 1)

or_something_like_this = (
  0,
  {x="1.2", y="1.x"},
  1,
)

Or, could we switch all arrays to be mixed-type?

LongTengDao commented 5 years ago

@eksortso Simply switch all arrays to be mixed-type, will not break backwards compatibility, because it's not allow before. The history .toml files with mixed-type array was an error, no ambiguity. (If it breaks backwards compatibility, nothing in the world not break. )

Why design this rule? (I didn't notice in spec that array can't be mixed, thanks for reminding. ) Because in some programming language, array must not be mixed? But TOML is not a data swap format, it's a config format, if the running host not support mixed-type array, then the config file for that host is absolutely with error of using. The spec should not consider the host difference too much.

And null is not supported too, what's the reason? Which language not support null? I can't understand, thanks for telling me.

() maybe should be reserved for more important purpose in the future. It's a great symbol. Maybe for diy component? <> xml-like style is also ok, even better for custom type extensions.

eksortso commented 5 years ago

null is not supported because, like you said, TOML is a config format, and nulls don't have a place in well-designed configurations.

I don't know why there are no mixed-type arrays, but it could be argued by the same reasoning that mixed-type arrays aren't required for well-designed configurations. Maybe that's a harsh assessment, but you have to admit that it's more sensible to put mixed types into a table, because each element of a table has a name (its key), and the orderings don't really matter. TOML does hold strong opinions about certain things, buried in its syntax.

But I think we're seeing worlds collide here. TOML seems to be gaining some popularity as a human-readable data exchange format, despite not having certain features expected of such a format. Since JSON is essentially frozen and TOML is still not yet v1.0, I expect we'll see more such feature requests on the horizon.

Personally, I would rather see TOML hit 1.0 as a config format first, before reaching for new horizons. Configuration is its currently acknowledged strength.

alinnert commented 5 years ago

@LongTengDao @eksortso Just in case this comes up again at a later time, I want to add a few things to consider. By the way, I agree that TOML should support mixed typed arrays. VuePress supports TOML config files, but I found a construct that can't be expressed in TOML because of this. So, I have to switch to JS for now.

About null and language support: There are languages that don't support null, e.g. Rust. It has an Option type instead. After all, null is also called the "billion dollar mistake". So far I personally don't see the necessity to add null. You can just leave out an option or comment it out instead.

About the syntax: (1, "hello", true) actually makes sense. Mixed types arrays are often called "tuples". C# 7.0 introduced this syntax for tuples:

var unnamed = (42, "The meaning of life");

Also tuples in Python look like this:

tup1 = ('physics', 'chemistry', 1997, 2000);
mmakaay commented 5 years ago

While there are languages that support tuple-style data types, not all programming languages do. When a goal of TOML is to support the format in a wide range of languages in a straight forward portable way (from what I've seen up to now, that seems to be one of the goals), a tuple type would cause some headaches in strongly typed languages that don't support them natively. Strongly type languages (e.g. Go) will always be able to read in values like this, but it might force the code in ugly directions (storing everything as a string with the detected type accompanying it for example, or variables that are only checked runtime, having to use sluggish code reflection to do the type checking yourself at runtime). I wouldn't be a fan of this.

Thinking of how I would encode a tuple-ish data structure in TOML right now, I'd go for a table structure. The Python example could for example be expressed as this:

tup1 = { 0='physics', 1='chemistry', 2=1997, 3=2000 }

Here I assumed that the individual tuple fields really have no semantic meaning that could be described using a pretty key/value pair table. When building my own configuration, I'd start with thinking about a better way to represent the data, since a tuple is not necessarily a user friendly way of conveying information to / from a human being.

There are of course some typical uses that might not lead to confusion to the reader, e.g.:

listen = ("127.0.0.1", 8080) # ip address + port number
file.owner = ("root", 1000) # username + gid

but still I would go for a dictionary format in these cases. Not only to make clear what each field means, but also to be able to leave out fields and let my application fallback to default values, e.g.:

listen1 = { ip="127.0.0.1", port=8080 }
listen2 = { port=8888 } # e.g. using default ip 0.0.0.0
file.owner1 = { user="root", gid=1000 }
file.owner2 = { uid=1000, group="users" }

This is where the configuration is designed to be used by a human. The computer might have to do a translation to a useful internal format, but that's what computers are good at.

As for the example that can't be expressed in TOML: are you referencing this one?

module.exports = {
  head: [
    ['link', { rel: 'icon', href: '/logo.png' }]
  ]
}

I agree that you cannot use that code 1-on-1 for TOML, but TOML is not JSON, so that's not a big surprise I'd say. What's wrong with the following translation (I haven't looked at the exact use of it, but simply check if I could express the same data in friendly TOML)?

[module.exports]
head = [
    { tag = 'link', attrib = { rel='icon', href='/logo.png' } }
]

This is working TOML code, which describes the code that you referenced. If I would have to create a configuration file for this, I would however not be using this. Instead I would use a simpler configuration file in TOML, which I would translate / compile into a JSON file to be used by VuePress. Here's what my configuration file for the above would probably look like:

site.icon = '/logo.png'

This config contains all information to generate the required module.exports config in JSON. I think this is where me and @eksortso would agree very much on having a good design of the configuration, making it a document that is easy to read and manage, and which mirrors the mental image of the system (a typical human will be looking for "How can we make /logo.png the site icon?" and not "Is there a config structure where I can inject an icon link rel into the site header?").

Another good thing about this kind of configuration file design, is that the configuration directive can be cleanly commented out in an example config, accompanied by some comments to explain the feature to the user of the config.

alinnert commented 5 years ago

@mmakaay About your first point: I totally get what you mean. I just have one question. How does JSON solve it? I mean, from a technical standpoint it's also just a big string containing some typed values. And since it's commonly used (I'd assume) there need to be solutions for this issue out there. How do languages without tuples read JSON arrays with different types? Maybe we can get inspiration there. (Do you happen to know any of those languages?)

About VuePress: I'm not the one who reads the config file. VuePress does. And it provides first-class TOML config support. The only thing I can do here is to open an issue at the VuePress repo about changing the config file format. And this would mean a breaking change in VuePress. But on the other side that's currently the smartest thing to do, I guess.

mmakaay commented 5 years ago

There are different ways in which JSON decoders tackle this. In the most generic way possible, the code that reads the JSON data creates its own generic container type for JSON values, using as much of the language's type system as possible. It's basically the creation of a syntax tree that describes the JSON document in data structures as supported by the language.

Let's assume a programming language that has [ lists ], { dictionaries ] and "strings" as the only types available. Then the following JSON code:

{
    "year": 2019,
    "things": [ 1, true, "whack!" ]
}

could be read into a syntax tree that could look like:

{ jsonType: "object", value: {
    "year": { jsonType: "int", value: "2019" },
    "things": [ 
        { jsonType: "int", value: "1" },
        { jsonType: "bool", value: "true" },
        { jsonType: "string", value: "whack!" }
    ]    
}}

So all original types are preserved and you could therefore use this to create the original JSON document. The code that uses this decoded version of the JSON data would have to beware of the original types. When some operations needs to add the year to the first thing, the code should be aware that these were originally integers and that the output must become "2020" and not "20191" (since the latter might be what adding two strings might mean). Here you see that extra work has to be done to let the programming language work with the data.

Had the language been Go, then the syntax tree might have looked like this in some weird pseudo-Go-code representation:

{
    year = 2019 (interface{} int),
    things = [
        1 (interface{} int),
        true (interface{} bool),
        "whack!" (interface{} string),        
    ]
}

In Go we have the empty interface{} which can represent all types. However when this structure is in memory, with somewhere buried within the interface the intended values in an appropriate data type, the Go code will have to use reflection in order to investigate at run time what data types are wrapped by all these interfaces.

When acessing the data, the Go code will have to use reflection techniques in the runtime code to find out stuff like "what is the type of value that is contained in things[1] ? There's no other way however to use these data, since Go does not have a data type that can hold a mix of data types. The best alternative is this list of interface{} items which can never be used directly in the code.

So this is what makes tuple-style values hard to work with from within languages in which a tuple is not a supported data type. Yes, data can always be read, but it's not easy to use those data and performance will suffer because of all the extra data inspection you need to do.

I hope this cleared up some things. Didn't have time to proof read, since my train is rolling into my station now ;-)

alinnert commented 5 years ago

Oh, I remember seeing JSONArray and JSONObject classes some years ago. Where did I see them? Java? I guess you mean those? If so, I see what you mean.

mmakaay commented 5 years ago

Yes, those sound like part of an AST (abstract syntax tree) for keeping the structure of a JSON file in Java type structures.