jeml-lang / jeml

Just Enough Markup Language
MIT License
16 stars 0 forks source link

Consider removing quotes? #2

Open stephenreay opened 5 years ago

stephenreay commented 5 years ago

The syntax of this strikes me as focused on human (and not necessarily overly technical) reading/writing.

I think the one issue that could be improved in that regard, is the use of quotes.

If the spec dropped support for multiple values per line, quotes would not be necessary for string values, which means escaping quotes is no longer necessary.

This suggestion/idea comes directly based on a problem I’ve had on a client project where YAML is currently used, and the use of quoting and inevitable escaping is a major source of frustration for non-technical users.

judah-caruso commented 5 years ago

I definitely like the idea of further simplifying the spec. However, I'm not sure if one value per line is the best way, as it might increase the verbosity of files. Is this the kind of syntax you had in mind?

project {
  authors [
    Person <email@email.email>
  ]
  description A cool project.
  license MIT
  keywords [
    project
    jeml
    cool
  ]
  url https://project.io/project
  build {
    name project
    path src/project.ext
  }
  dependencies {
    jeml {
      version 1.0.0
      flags [
        do-thing
        do-other-thing
      ]
    }
  }
}

As for the problem with escaping characters, I was considering (and still am) combining both string types into one, so any and all strings would be 1:1 representations of what's inside.

So rather than writing:

# [...]
  option1 "This\nis a multi-line\nstring"
  option2 """This
is a multi-line
string"""
# [...]

You would write:

# [...]
  option1 "This\nis a multi-line\nstring"
  option2 "This
is a multi-line
string"
# [...]

Meaning, manually escaping characters is fine, and writing strings literally is also fine. I'm not sure of the possible cross-platform implications of this, however.

nycki93 commented 5 years ago

I agree that if the goal is to be human-friendly, then the constraint "only one key per line" is a lot easier to understand than "all strings must have quotes around them." I do like the idea of allowing quotes around strings for when it's potentially ambiguous, though, like yaml does.

I'd also like to bring up an issue that I think gets forgotten too often -- should the "hanging indent" at the start of multi-line strings be removed? I don't think I've ever seen a situation where I want to keep that extra whitespace, and I've been in lots of situations where I have to hunt around for a tool to remove it, or just give up and de-indent the whole block, breaking the flow of the data file.

stephenreay commented 5 years ago

Hi @kyoto-shift yes that's basically what I had in mind. Whether it then parses \n as a newline character or as a literal backslash followed by n I guess is not something I'd considered much. I think that scenario is quite a lot less confusing than the quoting and escaping (of qutoes) issue that we get in things like YAML now.

In contrast to what @nupanick suggested though, I'd say that quotes should not be considered syntax. Im not sure exactly what would be considered 'ambiguous' and require quotes?

But I agree that leading white space makes little sense to keep.

judah-caruso commented 5 years ago

I agree that leading white space should be removed as well, and that any escape characters should be explicit, meaning they're only there if the author intended. However, I don't think a parser generator like ANTLR is able to enforce notions like this directly without giving up some portability, so that would mostly be up to the implementation, enforced by the spec, and maybe, an official JEML test suite.

I have given some thought to the removal of quotes for strings and I think it might be a little too minimalistic. For example, something like this could be confusing for the parser and the reader:

test {
  a key or value 0?
  another _ key or value [123]
}

Is a the key and key or value 0? a string value? Is 0 parsed as an integer? Is another the key and _ key or value [123] a string value? Is[123] parsed as a list, or a string alongside _ key or value?

Of course the answer would be that they're both parsed as strings, but the lack of quotes definitely adds some unwanted ambiguity, especially if we also consider multi-line strings. For example, would a be the key and key or value 0? another _ key or value [123] be a big multi-line string?

And then if a case is added where " is optional and can be used for "ambiguous" situations, what's the point of me not using them, as it would just be safer and lead to less confusion overall? To me, optional (or even no) quoting seems to add a slight bit of simplicity with the trade off of comprehension. The optional rule would also allow dogmas like: "only ever use quotes" and "never use quotes unless you need them" in communities that use JEML.

However, if you could provide some cases where not having quotes is easier to read, parse, or understand, I'm onboard and all for it!

stephenreay commented 5 years ago

So personally I’d take a very restrictive approach to those examples.

Keys shouldn’t contain spaces anyway, and the rest of the line should be a string.

I’d only support “list” syntax as a multi line construct (same for hashes) so the only “value” part on the same line as a key for those would be a square bracket or curly brace, and the following line would be treated as the first value (or key and value if a hash).

I’ll try to get an example of what I mean later today with examples where it may seem ambiguous (on phone, half covered in concrete right now!)

To clarify (not sure if this came across earlier) my primary goal is a data markup format I can use for client projects, where the person using this data may not be overly technical. Yes technical enough to write a flat data file like this with instruction but non technical enough that they get caught out by things like yaml’s quoting scenarios, or indenting rules.

stephenreay commented 5 years ago

Sorry for the delay on this.. anyway, so in the example given above

test {
  a key or value 0?
  another _ key or value [123]
}

A json representation of that (just as a way to show exactly how it's parsed because JSON isn't ambiguous about anything really) would be:

{
    "test":
    {
        "a": "key or value 0?",
        "another": "_ key or value [123]"
    }
}

To give a more complex example, in my mind the 'example' JEML block from the readme, which uses several inline'd values, quotes etc, under this proposal (and if not updated to match it obviously) would be parsed as (again, in json, for clarity):

{
    "creator":
    {
        "name": "\"Judah Caruso-Rodriguez\"",
        "description": "\"Creator of the JEML specification :)\"",
        "website": "\"https://0px.moe\""
    },
    "project":
    {
        "name": "\"Example Project\"",
        "version": "\"0.0.1\"",
        "authors": "[ \"Cool Person <email@email.email>\" ]",
        "description": "\"A cool project for cool people\"",
        "license": "\"MIT\"",
        "keywords": "[ \"project\" \"jeml\" \"cool\" ]",
        "url": "\"https://project.io/project\"",
        "build":
        {
            "name": "\"project\"",
            "path": "\"src/project.ext\""
        },
        "dependencies":
        {
            "jeml": "{ version \"1.0.0\" flags [\"do-thing\"] }",
            "toml": "{ version \"0.5.0\" windows_only? true }",
            "yaml": "{ version \"1.3.0\" optional? true }"
        },
"dev-dependencies":{"reflect":"\"1.4.0\""}
    }
}

And conversely, to give the intended meaning, the original JEML would actually end up being written as:

# This is a JEML file!
creator {
    name Judah Caruso-Rodriguez
    description Creator of the JEML specification :)
    website https://0px.moe
}

project {
    name Example Project
    version 0.0.1
    authors [
        Cool Person <email@email.email>
    ]
    description A cool project for cool people
    license MIT
    keywords [
        project
        jeml
        cool
    ]
    url https://project.io/project
    build {
        name project
        path src/project.ext
    }
    dependencies {
        jeml {
            version 1.0.0
            flags [
                do-thing
            ]
        }
        toml {
            version 0.5.0
            windows_only? true
        }
        yaml {
            version 1.3.0
            optional? true
        }
    }
    dev-dependencies {
        reflect 1.4.0
    }
}

In general:

One ambiguity I've thought of here, is numbers vs strings, but I think a simple solution for that is: if the 'value' part consists purely of digits 0-9, it's a number, otherwise it's a string. Possibly decimal numbers (i.e. 1.343) could be treated as a float type (although even that may not be practical for simplistic languages like shell) but because of potential rounding issues I would maybe leave that as implementation specific.

stephenreay commented 5 years ago

Having just re-read the readme section on integer/float handling, I think the current allowed formats could still all be handled un-ambiguously in a no-quotes proposal