Open stephenreay opened 5 years ago
I definitely like the idea of further simplifying the spec. However, I'm not sure if one value per line is the best way, as it might increase the verbosity of files. Is this the kind of syntax you had in mind?
project {
authors [
Person <email@email.email>
]
description A cool project.
license MIT
keywords [
project
jeml
cool
]
url https://project.io/project
build {
name project
path src/project.ext
}
dependencies {
jeml {
version 1.0.0
flags [
do-thing
do-other-thing
]
}
}
}
As for the problem with escaping characters, I was considering (and still am) combining both string types into one, so any and all strings would be 1:1 representations of what's inside.
So rather than writing:
# [...]
option1 "This\nis a multi-line\nstring"
option2 """This
is a multi-line
string"""
# [...]
You would write:
# [...]
option1 "This\nis a multi-line\nstring"
option2 "This
is a multi-line
string"
# [...]
Meaning, manually escaping characters is fine, and writing strings literally is also fine. I'm not sure of the possible cross-platform implications of this, however.
I agree that if the goal is to be human-friendly, then the constraint "only one key per line" is a lot easier to understand than "all strings must have quotes around them." I do like the idea of allowing quotes around strings for when it's potentially ambiguous, though, like yaml does.
I'd also like to bring up an issue that I think gets forgotten too often -- should the "hanging indent" at the start of multi-line strings be removed? I don't think I've ever seen a situation where I want to keep that extra whitespace, and I've been in lots of situations where I have to hunt around for a tool to remove it, or just give up and de-indent the whole block, breaking the flow of the data file.
Hi @kyoto-shift yes that's basically what I had in mind. Whether it then parses \n
as a newline character or as a literal backslash followed by n
I guess is not something I'd considered much. I think that scenario is quite a lot less confusing than the quoting and escaping (of qutoes) issue that we get in things like YAML now.
In contrast to what @nupanick suggested though, I'd say that quotes should not be considered syntax. Im not sure exactly what would be considered 'ambiguous' and require quotes?
But I agree that leading white space makes little sense to keep.
I agree that leading white space should be removed as well, and that any escape characters should be explicit, meaning they're only there if the author intended. However, I don't think a parser generator like ANTLR is able to enforce notions like this directly without giving up some portability, so that would mostly be up to the implementation, enforced by the spec, and maybe, an official JEML test suite.
I have given some thought to the removal of quotes for strings and I think it might be a little too minimalistic. For example, something like this could be confusing for the parser and the reader:
test {
a key or value 0?
another _ key or value [123]
}
Is a
the key and key or value 0?
a string value? Is 0
parsed as an integer? Is another
the key and _ key or value [123]
a string value? Is[123]
parsed as a list, or a string alongside _ key or value
?
Of course the answer would be that they're both parsed as strings, but the lack of quotes definitely adds some unwanted ambiguity, especially if we also consider multi-line strings. For example, would a
be the key and key or value 0? another _ key or value [123]
be a big multi-line string?
And then if a case is added where "
is optional and can be used for "ambiguous" situations, what's the point of me not using them, as it would just be safer and lead to less confusion overall? To me, optional (or even no) quoting seems to add a slight bit of simplicity with the trade off of comprehension. The optional rule would also allow dogmas like: "only ever use quotes" and "never use quotes unless you need them" in communities that use JEML.
However, if you could provide some cases where not having quotes is easier to read, parse, or understand, I'm onboard and all for it!
So personally I’d take a very restrictive approach to those examples.
Keys shouldn’t contain spaces anyway, and the rest of the line should be a string.
I’d only support “list” syntax as a multi line construct (same for hashes) so the only “value” part on the same line as a key for those would be a square bracket or curly brace, and the following line would be treated as the first value (or key and value if a hash).
I’ll try to get an example of what I mean later today with examples where it may seem ambiguous (on phone, half covered in concrete right now!)
To clarify (not sure if this came across earlier) my primary goal is a data markup format I can use for client projects, where the person using this data may not be overly technical. Yes technical enough to write a flat data file like this with instruction but non technical enough that they get caught out by things like yaml’s quoting scenarios, or indenting rules.
Sorry for the delay on this.. anyway, so in the example given above
test {
a key or value 0?
another _ key or value [123]
}
A json representation of that (just as a way to show exactly how it's parsed because JSON isn't ambiguous about anything really) would be:
{
"test":
{
"a": "key or value 0?",
"another": "_ key or value [123]"
}
}
To give a more complex example, in my mind the 'example' JEML block from the readme, which uses several inline'd values, quotes etc, under this proposal (and if not updated to match it obviously) would be parsed as (again, in json, for clarity):
{
"creator":
{
"name": "\"Judah Caruso-Rodriguez\"",
"description": "\"Creator of the JEML specification :)\"",
"website": "\"https://0px.moe\""
},
"project":
{
"name": "\"Example Project\"",
"version": "\"0.0.1\"",
"authors": "[ \"Cool Person <email@email.email>\" ]",
"description": "\"A cool project for cool people\"",
"license": "\"MIT\"",
"keywords": "[ \"project\" \"jeml\" \"cool\" ]",
"url": "\"https://project.io/project\"",
"build":
{
"name": "\"project\"",
"path": "\"src/project.ext\""
},
"dependencies":
{
"jeml": "{ version \"1.0.0\" flags [\"do-thing\"] }",
"toml": "{ version \"0.5.0\" windows_only? true }",
"yaml": "{ version \"1.3.0\" optional? true }"
},
"dev-dependencies":{"reflect":"\"1.4.0\""}
}
}
And conversely, to give the intended meaning, the original JEML would actually end up being written as:
# This is a JEML file!
creator {
name Judah Caruso-Rodriguez
description Creator of the JEML specification :)
website https://0px.moe
}
project {
name Example Project
version 0.0.1
authors [
Cool Person <email@email.email>
]
description A cool project for cool people
license MIT
keywords [
project
jeml
cool
]
url https://project.io/project
build {
name project
path src/project.ext
}
dependencies {
jeml {
version 1.0.0
flags [
do-thing
]
}
toml {
version 0.5.0
windows_only? true
}
yaml {
version 1.3.0
optional? true
}
}
dev-dependencies {
reflect 1.4.0
}
}
In general:
{
or [
, it's treated as a "single" value.One ambiguity I've thought of here, is numbers vs strings, but I think a simple solution for that is: if the 'value' part consists purely of digits 0-9, it's a number, otherwise it's a string. Possibly decimal numbers (i.e. 1.343
) could be treated as a float
type (although even that may not be practical for simplistic languages like shell) but because of potential rounding issues I would maybe leave that as implementation specific.
Having just re-read the readme section on integer/float handling, I think the current allowed formats could still all be handled un-ambiguously in a no-quotes proposal
The syntax of this strikes me as focused on human (and not necessarily overly technical) reading/writing.
I think the one issue that could be improved in that regard, is the use of quotes.
If the spec dropped support for multiple values per line, quotes would not be necessary for string values, which means escaping quotes is no longer necessary.
This suggestion/idea comes directly based on a problem I’ve had on a client project where YAML is currently used, and the use of quoting and inevitable escaping is a major source of frustration for non-technical users.