tokay-lang / tokay

Tokay is a programming language designed for ad-hoc parsing, inspired by awk.
https://tokay.dev
MIT License
236 stars 7 forks source link

feat: New list syntax with enforced `,`-operator #100

Closed phorward closed 7 months ago

phorward commented 1 year ago

This pull request initially was started to evolve a new syntax for lists using the [...] brackets, like in Python. But in Tokays sequence/parsing behavior, it worked out that this syntax won't be useful, because it becomes just another type of sequence. Therefore, the previously optional comma , was re-defined, and now defines a list within this pull request.

Therefore, the following examples and rules apply:

d = ()  # this is now a dict
l = ,  # this is now a list
l = (,)  # is equal to above

l = 1,  # list with one item
l = (1,)  # is equal to above
l = 1, 2, 3  # list with 3 items
l = 1, (2, 3, 4)  # list with 2 items, second a list with 3 items

# mixed syntax not allowed
d = (1, b => 2, 3)  # previously accepted; now a syntax error.
d = (1 b => 2 3)  # clarified: it is a dict equal to (0 => 1 b => 2 2 => 3)
d = (1 b => (2, 3))  # clarified: it is a dict equal to (0 => 1 b => (2, 3))
l = (1, (b => 2), 3)  # clarified: it is a list, with a dict in l[1]

Initial comment:

This pull requests drafts and implements an explicit list syntax for Tokay. Similar to Python and Rust, it uses [...] to specify explicit lists.

This eliminates the following problems Tokay has so far

  • (...) either produces dicts or lists, () is the explicit dict
  • [...] produces lists, [] is the explicit list
  • By resolved #79, dicts can be used like lists
  • By #98 the former character-class syntax was replaced to make the brackets available

This PR is a draft and under development.


* `(...)` either produces dicts or lists, `()` is the explicit dict

This isn't useful. Currently, (1 2 3) generates repr [1, 2, 3].

Short roadmap on this topic:

  1. (...)-syntax is called a dict-bound sequence and should always produce dicts. (1 2 3) should reproduce repr (1, 2, 3).
  2. [...]-syntax is called a list-bound sequence and always produces lists. It should allow for |-operator as well, equally to dict-bound syntax, to specify inline sequences
  3. The AST traversal, respectively object traits, should allow iteration over either a list or a dict's values in order.
  4. The pos/kle repeat-constructs currently either produce a single dict or a list of dicts. This is should be changed to always product a list of 1..n items, or in case of kle, void, if no match is found.

It should also be considered that this PR is being continued when #10 (#105) is generally implemented, so that several constructs become easier to define.

phorward commented 1 year ago

(Moved roadmap to top)