ozra / onyx-lang

The Onyx Programming Language
Other
97 stars 5 forks source link

Tuple syntax revisited #57

Open ozra opened 8 years ago

ozra commented 8 years ago

Tuple-syntax has been disturbing me for a while, but since switching to parentheses would make code very cloudy at best, I haven't gotten anywhere. Fortunately, the issue was raised in the crystal repo, and there was one idea put forward that caught my attention: using angle brackets.

Current syntax: Has been removed since quite a while now!

my-tuple = {47, "foo", #bar}

Proposed syntax: Has been tried, and dismissed!

my-tuple = <47, "foo", #bar>

For latest implementation status and suggestions - follow the commentry below instead of this OP contents [ed: 2016-09-20 ]

Pros

Any opinions?

stugol commented 8 years ago

How would this make named tuples possible?

ozra commented 8 years ago

It doesn't "magically". It has to be implemented in its' own right. But according to asterite and the crystal community there's interest in it. It makes it possible in that it doesn't clash with hash-literals if changed: my-hash = {foo: 47, bar: "ok"}, whereas with the proposed delimiters the same could be done for tuples, and it would then be more consistent too.

The important part of the endeavour for me is that braces are "unnatural" for Tuples. They should form Sets imo.

stugol commented 8 years ago

Ah, I see:

my-named-tuple = <name: "Fred", age: 6>

Nice.

My opinion on "unnatural", however, is "meh" ;) While I do have a small amount of experience in set theory; my experience is mostly of programming, where braces can have pretty much any meaning.

ozra commented 8 years ago

I spent the day coding angular syntax, and it has quite an impact on parsing. Since </> are asymmetric gt/lt and for tuples they're symmetric, the parsing ends up doing nested parse-trials to come up with a suitable solution, at every point where any of this tokens occur, recursively re-parsing shit-loads of code again and again. Simply, it taxes (and complexifies...) the parser too much. We want it to be fast - that's important.

The first idea for tuples (the most common notation) is (elements, here), however, as I've mentioned, I think it doesn't identify them clearly enough.

Seeing that one-tuples would need for instance trailing comma (like in Python) to identify it as a tuple instead of expression-grouping, I thought, hell, why not trailing comma for tuples always? It makes for a recurring beacon. It looks a bit goofy at first, but when you read through code, you start appreciating it - it makes the tuples pop out a little more from groupings, lambdas, wavy-lambdas, call-arg-lists, etc. parentheticals. But then suddenly brackets became viable.

a-tup = (some, vals, here,)
another-tup = (1,)
zero-tup = (,)

foo(...x) -> x

foo((1, 2,), foo (a, b, c,))

Alt. 2, same thing but with brackets:

a-tup = [some, vals, here,]
another-tup = [1,]
zero-tup = [,]

foo(...x) -> x

foo([1, 2,], foo [a, b, c,])

Pro of brackets are that a tuple and list share many traits, the same way sets and maps share many traits. Symbol associativity is somewhat clearer.

Other ideas are welcome that:

Worst case, tuple syntax has to revert to braces (but it's "wrong", and also Set wants them)

stugol commented 8 years ago

I don't like this idea. Trailing commas are ugly, and bugs would creep in when we forget to use them. Also, trailing commas should just be elided in arrays and such:

values = [
  1,
  2,
  3,
]

Angle brackets would be tricky in practice anyway, thinking about it:

fn(a,b) ->
   < a<b >       -- ???

It could be done, but it sounds like the compiler can't cope with it. I reckon stick to braces. Do we even need set literals? Sets aren't often used in my experience. Why not use {{ ... }} or something like that?

Alternatively, tuples could be denoted using <[ ... ]>. Surely that wouldn't break the compiler? <[ is not the same as <, after all.

Thinking about it, I would prefer the following:

fn(n) ->
   yield n

fn 1, ~> say "block {@1}"
fn(2) { say "block {@2}" }

I know you don't like braces for blocks, but the fact is that many people - including me - like them. The wavy arrow is fine in many cases, but sometimes blocks are better:

[1,2,3].map { ... }.select { ... }.sort { ... }.map { ... }.reduce(0) { ... }
[1,2,3].map( ~> ... ).select( ~> ... ).sort( ~> ... ).map( ~> ... ).reduce(0, ~> ... )

The braces notation is simply clearer in some cases, especially where arguments are being passed as well.

ozra commented 8 years ago
  1. The trailing comma is strictly required to be back to back to final wing.
values = [
  1
  2
  3
,] -- needs to be before final closing wing _unspaced_
  1. Yes, I got it working - it just was too slow. Your example: compiler sees <, pushes state backup, tries tuple. Sees < again, pushes state backup, tries tuple. Sees > closes tuple #2. Program fails to compile. Compiler backs on state, tries it as operator, sees >, closes tuple #1. Program works. And now imagine that procedure for 100 lt/gt ops, even if there wasn't a single tuple in the program it tries tuples for every variation until it has starved every possibility, then finally end up at the first lt and tries them all as operators. Gaaaah. This was of course a naïve implementation; with a bit of heuristics and statistics both CPU & RAM use could be improved - but it's definitely not worth it.
fn(a,b) ->
   < a<b >       -- ???

Thinking about it, I would prefer the following: ...

Using two-char combos is an alternative, it definitely makes the literals "pop out", the question is if that doesn't become even uglier. I'll have to do some renderings of code with different styles to get a clearer birds view.

I dislike braces for block syntax - it's not out of the question, so we'll keep it on the table. The reason is that grouping code, as I've mentioned before is far cleaner with parentheses imo, which also separates it from Map/Set/Tup literals (depending on choice). And that's ofc. why parens for tups messes up literal vs code. The "slant"-notation for soft-lambdas is still a viable idea in my mind. They look "less nice" then curly-arrow at first sight, but in actual code they turn out pretty sleak. The braces-notation leans visually to the right in an unbalanced way imo.

[1, 2, 3].map { ... }.select { ... }.sort { ... }.map { ... }.reduce(0) { ... }
[1, 2, 3].map(\ ... ).select(\ ... ).sort(\ ... ).map(\ ... ).reduce(0, \ ... )

And especially for cases where the s-lambda is used mainly for (multi line) expression block passing:

loop \
   do-shit
   here

It's like it "points to" an indented expression list.

The fact that it is a half λ doesn't hurt either ;-)

Going back to the <...> notation: there is a simple way to make it much more parseable, at the cost of some vigilance, and we've resorted to the take before: requiring spaces around lt/gt ops, requiring non-spaced tuple angulars. More and more it seems like the simplest way for the coder in combination with syntax choices - for maximum terseness while maintaining clarity - is to set a hard rule that all binary operators should always be spaced - then there's no confusing variations to consider.

Bad form:

a = 5+3  -- illegal - instill the fact that binary ops should always be spaced
b = 4-2  --  illegal - also ensures my-idfr-5 errors is less prone to occur
c = a<b  -- illegal
t = < a, b, c >  -- illegal, proposed Tup-literal

Good form:

a = 5 + 3
b = 4 - 2
c = a < b
t = <a, b, c>

I made some quick statistics by searching some (large) code-bases, and I found non-spaced binary ops to almost never occur in well coded programs (despite it being allowed). The only places I found was in loop-constructs to b-1, which is moot in Onyx cause you'd preferably do a...b instead.

So such a rule is not far fetched at all.

I'll re-code my PoC implementation branch trying out the rule for lt/gt/angular tup just for kicks, I have some spare time to spend atm. No trailing comma will be required at any point with this solution.

A further rule in the PoC-implementation will be that < followed by newline+indent is considered multi line tuple, this will limit the ability to use lt/gt at a line-crossing - something I find perfectly reasonable.

unity-foo(x) -> x
a = 1
b = 2

-- illegal LT syntax:
x = a <
   b

-- legal Tup literal syntax
t = unity-foo <
   a
   b
>
stugol commented 8 years ago

The braces-notation leans visually to the right in an unbalanced way

Only because I write them that way:

[1, 2, 3].map{ ... }.select{ ... }.sort{ ... }.map{ ... }.reduce(0){ ... }

Happy now? ;)

sleak

Sleek.

1, 2, 3].map(\ ... ).select(\ ... ).sort(\ ... ).map(\ ... ).reduce(0, \ ... )

Slant-notation looks okay in that context, yes. But godawful in others. And it's ugly in the reduce call, sharing with a normal parameter.

The fact that it is a half λ doesn't hurt

You could always support λ, , and for optional use. We don't have to stick to the ASCII table, provided we still offer ASCII alternatives.

requiring non-spaced tuple angulars

But I often like to leave space around my delimiters:

a = [ 1, 2, 3 ]
b = { name: "Fred", age: 7 }

I really think <[ ]> is the way to go.

stugol commented 8 years ago

While personally I'm not against requiring spaced operators (b - 1 etc), I don't think it's necessary here. Better to avoid restricting it without good reason (e.g. to enable a ?? b null-propagation)

ozra commented 8 years ago

I implemented the proposed PoC, and tried it on my "throw-up-spec", which has insane mis-use of the language for the sole purpose of pushing parsing to the limit. And it worked perfect except for two (purposely insane) compares that crossed a new-line (new rule forbids). Changing them to "normal coder style" and it all worked out great. No trailing commas. The only aber: angular-enclosers must be snug-fitted.

Parentheses and angular tuple notation can co-exist, so I'll implement that too in the PoC so it's easier to compare in actual use.

ozra commented 8 years ago

brace "leaning"

[1, 2, 3].map{ ... }.select{ ... }.sort{ ... }.map{ ... }.reduce(0, { ... })  -- <-- block is part of args

Ah, well, yes that should be possible to support if brace-soft-lambdas were adopted, since the only other use of back-to-back idfr-brace is with type-named indexable literals - and they are of course initial capital: x = MyListType{1, 2, 3}.

Please give me a few lines of full examples of mixed uses of the brace-s-lambdas, I know you've produced some before, but just to get a few (with acting code also, not only ...) so I can look at it a bit more. Preferably practical real-world usable code.

You could always support λ, ≠, ≤ and ≥for optional use. We don't have to stick to the ASCII table

100% :+1:

But I often like to leave space around my delimiters:

Dang! Yes, in many cases it looks nice. Hmmm. Atm, in the PoC, it's only <tup, elements> that requires snug angulars.

In the "must allow spacing" case I even think parens are better, despite missing clear visual separation / identifiability. Trailing commas would only be necessary for 1-tuple. 0-tuple/unit should practically never happen but should require a comma too in that case - for clarity ("it's not an empty grouping I just forgot - it is a 0-tup").

But I really like the look of the overall code with the new angular-syntax. Hmm. Hmm, lot's of hmmm.

ozra commented 8 years ago

Whether or not () gets tuple meaning, I think indexable-destructuring, aka "multiple assign", perhaps should use that notation in receiving position. Even though parens as literal, depending on modifier, means s-lambda, lambda or tuple (if decided), parens obviously has a weaker type-bound meaning than brackets, which as a literal has a strong association with List. Thus:

(a, b, c) = some-list-or-hash-or-similar
-- instead of current syntax:
[a, b, c] = some-list-or-hash-or-similar

This gives even more parse cases for parentheses, but the effect on compile time in this case is minimal (compared to when intermingling with asymmetric ops)

ozra commented 8 years ago

Both (a, tup, here) and <a, tup, here> (the angular variant with the above given restrictions) are now implemented. {this, is, now, a, set}. This way the variations can be tested live before deciding finally.

stugol commented 8 years ago

I think this is a bad call :(

ozra commented 8 years ago

Re-opens. You would still prefer <[ / ]>?

stugol commented 8 years ago

Definitely. It stands out better, and its relationship to arrays is reflected. Besides, being prohibited from free-spacing is an annoying restriction.

And as I said, {{ }} for sets. They'll stand out better that way. Hardly anyone ever uses sets, so you want them to stick out; else they won't be noticed, and people will assume they're hashes or something.

And { } as a block option. You don't have to use it! ;)

stugol commented 8 years ago

Please give me a few lines of full examples of mixed uses of the brace-s-lambdas

Some of my actual code:

existing_files = dirs.map { |k,dir|
    dir = Pathname(dir) + name.to_s
    Find.find(dir).reject { |file| Dir.exist? file }.reject { |f| f.end_with? '.missing' }.map { |f| Pathname(f).sub_ext('').basename } if dir.exist?
}.flatten.to_a.compact
(doc/"img").reject { |img| !img[:src] || img[:src].start_with?("http://") }.map { |img| [img, comicdef[:domain]+$1] if img[:src] =~ %r{^(?:/|)(.*)$} }.compact.each { |v|
    v[0][:src] = v[1]           # Make all image urls be absolute
}
mounted = Partitions.GetMounts.select { |row|
    row[:mountpoint].to_s.downcase == arg.chomp('/').downcase || row[:device].downcase == arg.downcase || (Partitions.GetRealDeviceNameOf(row[:device]) && Partitions.GetRealDeviceNameOf(row[:device]) == Partitions.GetRealDeviceNameOf(arg))
}.map { |line| FSTab::FSTabEntry.new(device: line[:device], mountpoint: line[:mountpoint], filesystem: line[:filesystem], mntopts: opts[:mount_options]) }
ozra commented 8 years ago

Besides, being prohibited from free-spacing is an annoying restriction.

I kept the <el, em, ents> (which has that annoying requirement) in because I spent so much time on coding it and more importantly: to have one more syntax to compare with while writing actual programs. I agree the limitation is annoying, and the angular notation will be removed soon for that reason. I'll keep it in until Unicode-alternative is written in, then there's a more solid variation to use, and spaces are back to running free in all literals (it's only the angulars that have the requirement atm, so parentheses tuples can be spaced inside).

And as I said, {{ }} for sets. They'll stand out better that way. Hardly anyone ever uses sets, so you want

Afaic Sets are very common (both in maths and programs [I've coded in, not generally I agree]), and the distinction between Set and Map is very clear from the lack of keys imo. The soft-lambda-curlies (you manage to get that in everywhere, don't you! Haha) would of course be a conflict to that clarity, which is why I'd prefer a another solution (as is now, or with the back-slash, or some new idea we cook up - think out of the box and suggest :-) ).

Note: below is highly off topic for this issue - it should continue in the appropriate issue!

Here's your first sample rewritten. Please re-consider you coding style! ;-) Intermediate variables cost nothing (LLVM optimizes them away).

existing-files = dirs.map (k,dir) ~>
    dir = Pathname(dir) + name.to-s
    if dir.exist? => Find.find(dir).reject(~> Dir.exist? _1).reject(~.end-with? ".missing").map (f) ~> Pathname(f).sub-ext("").basename
.flatten.to-a.compact

And here's an alternative with the proposed slant-style (which I still think is a strong candidate - backslash will have no other uses [except in strings], so it will quickly gain mind-share), and using your %n proposition:

existing-files = dirs.map (k,dir) \
    dir = Pathname(dir) + name.to-s
    if dir.exist? => Find.find(dir).reject(\ Dir.exist? %1).reject(\.end-with? ".missing").map (f)\ Pathname(f).sub-ext("").basename
.flatten.to-a.compact

I honestly see no pros to the braces in comparison.

stugol commented 8 years ago

Whenever I see a backslash in code, I immediately think it's escaping something. I don't think that's going to change. \. to me means "escaped dot". I don't really see the advantage of \ over ~>, but if you're dead set on this kind of syntax, might I suggest choosing some different character?

ozra commented 8 years ago
  1. The things following the backslash are never anything you'd escape in a string (did you ever escape a dot?).
  2. Normally we view code highlighted, and then it's even clearer (the slant-lambda and escapes inside a string are totally different looking). That said, it should be clear even in BW as currently on github, but then I refer to 1) ;-)
  3. Mind share is what is needed, before you've ever seen a ship in your life, it looks like a dragon on the horizon, once used a few times the brain quickly picks them out in an entirely different way (latent inhibition leads to the brain creating most of our perception from internal constructs rather than actual visual input, if we don't know it, we don't see it). Side note: With paren-tuples it's a different thing, since there is no clear separating feature early on, like with the backslash (non esc-style character following it).

And still, it's just an idea on the drawing table, I just think it's good to evaluate options hard, to not get stuck in a dusty groove - and please suggest more options for syntax! Definitely! :-)

stugol commented 8 years ago

did you ever escape a dot?

All the time. I write regexes often.

latent inhibition leads to the brain creating most of our perception from internal constructs rather than actual visual input

I'm not familiar with the term, but yes, it does.

googles

"Latent inhibition is a technical term used in classical conditioning to refer to the observation that a familiar stimulus takes longer to acquire meaning (as a signal or conditioned stimulus) than a new stimulus."

Doesn't sound like you're using the right term.

ozra commented 8 years ago

All the time. I write regexes often.

Damn it, I missed that one. Haha. I must have done it 100 times only today :-P

Doesn't sound like you're using the right term.

Since it was some time ago, maybe there are related terms I should've used. In essence as I got it down: there are many effects following from the level of latent inhibition, the quoted description is correct in a sense, but extremely bad worded (imo), and do not touch upon implications of the requirements for "familiar" etc. It's about sensory filtering according to internal map. A completely, easily perceived as being different, scenario, acutely lower the inhibition and you take in more intell, turning to the source data. When it seems familiar - through the filtered perception - you're presented with your internal idea (a processing optimization), which is why \.x would be seen as a "familiar stimulus" mapping to the wrong idea, until the separating details are discerned and incorporated. For the quote to apply, the filtering must first be breached to convey that it actually is something familiar or new.

The brain uses the cached approximation unless it sees that there really is a need to re-evalute and re-fine the cache-map algo and its' stored approximations, more generally speaking.

As for the paren-tuple, the separating details are such nuances that the process will always be heavy, and the process of identifying will be loaded at all times with the two concepts' visual similarity (they won't differ much in highlighting either, so representation variation changes little), while I reckon both the context, and especially when presented highlighted, for slant-lambda would be richer in contrast thereby making the mapping-process much more efficient for the brain.

Hmm, I shouldn't have said "badly worded", because I'm not that good at wording things quickly myself.. hehe. But then again, two wrongs don't make a right. I hope the reasoning was somewhat understandable.

stugol commented 8 years ago

I guess. But \. will always conflict with regex syntax in the brain.

ozra commented 8 years ago

Bold words ;-) I think I might try it out in a local branch to see how it feels, since it will be just a few lines of code to implement (I do after all mash up maaany regexps a day too...).

Continued discussions on that subject should of course go into #14 from now on.

ozra commented 8 years ago

I began implementing <[...]> but quickly abandoned, since it unfortunately will have the exact same limitations as <...>: No good: <[ true, false, x[0]>5 ]>.

stugol commented 8 years ago

We're being excessively limited here. I thought we were going to force spacing of binary operators? Which would solve this problem, of course.

ozra commented 8 years ago

The forced spacing is not decided on, but the reasons for keep piling up. Since [T] has been removed from generic syntax this was now simple to implement. The alternative can now be tried along with the parenthesized tuple-syntax (which is "clean" but very unclear), and angular-syntax, which will be dropped.

stugol commented 8 years ago

I don't follow. Are we forcing spacing and implementing <[ ... ]>, or are we not?

ozra commented 8 years ago

Spacing will most likely be forced later on, right now it just prioritizes syntax so that non-spaced would give an error in specific cases, like ]>. It's just a matter of putting down the time, but since it will work without problems, I just push the enforcement down the line in favour of more pressing issues.

<[ ... ]> notation is implemented now.

ozra commented 8 years ago

Now that named-tuples has landed in the AST thanks to asterites efforts on Crystal, it suddenly becomes rather clear. As we've already concluded, tuples has traits in common with arrays, and naturally named tuples has traits in common with Maps/Dicts/Hashes. So the syntax suggested by @stugol, gets a nice semantic plus in contrast to name-tuple, simply:

a-tuple = <[ some, elements, here ]>
a-named-tuple = <{ some: named, elements: here }>
send-me-to-a-rest-api = <{
   token: "foo"
   things: <[
      "monkey"
      "wrench"
      <{ config: "special" }>
   ]>
}>.to-json
stugol commented 8 years ago

Neat.

Remind me....are tuples immutable? If so, what options do we have for anonymous structs? Records, essentially.

ozra commented 8 years ago

Yes tuples are highly immutable. Records is the closest thing, yes. I'm still thinking about whether some sugar for it should be let into the language itself, or just stick with the macro. It's pretty ok though.

stugol commented 8 years ago

We already have the infrastructure to support anonymous classes within methods (the visitor pattern), so records can actually be useful now.

I really think there ought to be some kind of syntax for this. It's essentially a mutable labelled tuple, right?

So... <# #> perhaps?

ozra commented 8 years ago

Do you have any practical examples of your use case?

stugol commented 8 years ago

Of course not. Don't be silly ;)

ozra commented 8 years ago

Hmm, I was sure that I had updated the status on Tuples - but apparently not: can't find anything in search here.

Shortly: as of the last few days, these syntaxes are the currently available for evaluation in Onyx:

tup = (1, 2, 3)
ntup = (a: 1, b: 2, c: 3)
tup = ‹1, 2, 3›
ntup = ‹a: 1, b: 2, c: 3›
tup = <[1, 2, 3]>
ntup = <{a: 1, b: 2, c: 3}>

I would feel content with keeping just the classic tuple syntax t = (paren, tuples), perhaps also with the addition of the optional mid-dot variation: t = ·(paren, tuples), and finally of course also the Unicode variant ‹unicode, tuples›.

And their corresponding named tuple variants, naturally.

Sod-Almighty commented 8 years ago

Sadly, UK (and US) keyboards have no mid-dot key.

I like the unicode symbols for generics and tuples. The ASCII ones are the difficult decision. I like the parens. I think the nested parens or leading space are fine - if the coder puts a space in by mistake, then he'll simply get undesired behaviour and will need to fix the typo. Adding mandatory punctuation that isn't even present on most keyboards is overkill, I feel. Just allow the mistake and let the coder figure it out. Besides, it probably wouldn't even compile.

I like the <[ ]> syntaxes, personally. If you leave the option in, then people who like them can use them to avoid the ambiguity of parens; and you don't have to use them if you don't want to. That's what the formatter is for.

The mid-dot isn't out of the question for any particular use; but you should treat it like unicode, not ASCII.

ozra commented 8 years ago

Windows OS is the only common OS where middot is not in a regular keymap: https://en.wikipedia.org/wiki/Interpunct#Keyboard_input

But, once again, that is a moot point, since it's suffice to enter (1, 2). The intention of the idea is to allow the traditional parentheses tuple syntax while at the same time making it more explicit in reading the code. (Once again, the rendering is something easily tasked to the stylizer).

The thing with the multi-character enclosing is that keeping them may prevent further developments of the language, or increased risk of ambiguities and thus harder to decipher error analysis. I will leave them in for several months more for continued trial (and obviously then, perhaps permanently when alpha stage is passed if deemed so).

So, once again, the middot idea is (obviously!) not an ASCII-alternative, it's simply an alternative to make the classic tuple notation more readable in certain situations.