ozra / onyx-lang

The Onyx Programming Language
Other
97 stars 5 forks source link

"Route '66 Style"* Syntax For Code Blocks (Braces) #97

Open ozra opened 8 years ago

ozra commented 8 years ago

Cross Crawling Time

I've been of the simple minded opinion that braces are a thing of the past.

At an Erlang×Haskell convention I talked to a guy who loved braces and semis and wouldn't give it up for the world. Also @Sod-Almighty has expressed (to the spam level ;-) ) a want of braces for fragments specifically. Further, when talking to @neotech, the - pretty much requirement - of braces for block structuring in a language was strongly expressed too.

In respect of these opinions, I've come to the realization that my assumption was flawed. A lot of us coders are wetware-configured to like braces structuring, just the same as a lot of us are wetware-configured to find indentation-based to be clearer.

Hence forth - I've been spending time trying to figure out if it would be possible to implement into Onyx without conflicts, and also avoiding parsing directives.

And, as usual, a nice cigar delivered the simple solution. Here's what I figure:

The potential conflicts

Which were the reasons brace-blocks were seen as problematic to begin with

my-set = {1, 2, "foo"}
my-hash = {foo: 47, bar: "yo"}
some-func {1, 2, 3}

Stemming from the use of the brace in combination with the fact that Onyx allows Haskellish/Livescriptish/Nimish/Rubyish juxtaposition calls (parentheses less).

The Simple Solution

-- "standard" Onyx func-def
foo(x, y, z) -> x + y + z

bar(x Str, y Int) -> Int
   x + y.to-s

-- braces func-def
foo(x, y, z) { x + y + z }

bar(x Str, y Int) Int {
  x + y.to-s
}

-- Fragments - several alternatives - all doing the same thing:
list = [1, 2, 3]

list.map (x) ~> x + 1
list.map ~> _1 + 1
list.map ~.+ 1

list.map \x\ x + 1
list.map \\ _1 + 1
list.map \.+ 1

list.map (x) { x + 1 }
list.map () { _1 + 1 }

list.map \x\ { x + 1 }
list.map \{ _1 + 1 }

-- `if` control constructs
if x and bar
   do-stuff

if x and bar: do-stuff
if x and bar then do-stuff
if x and bar => do-stuff

-- parentheses _must_ be used around condition - to disambiguate from set-/map-literals
if (x and bar) {
   do-stuff
}

if (x and bar) { do-stuff }

-- Other control structures - same thing
switch (foo) {
   case (1) { do-shit }
   case 2: do-shit
}

-- keyword-less cases
switch (foo) {
   (1) { do-shit }
   2: do-shit
}

You catch the drift.

Possible hassles

Here's an example of the hells raised if forgetting to disambiguate:

if foo && bar {
   do-stuff
}
-- This means `if (foo && bar({do-stuff}))` 
-- (call bar, with arg being a set containing the result of do-stuff)

And an example of a set-literal and '66-block in vicinity for the visual aspects:

if foo and my-char in? {some, chars}: do-something
if (foo and my-char in? {some, chars}) { do-something }

Slight Gotchas

The only price one must pay if '66-style is wanted, is that it will look a lot like C/JS in practise (parentheses needed around conditions and the like) - chances are some "bracers" even see that as a pro.

One slightly naughty part of the proposal is fragments expressed as (x) { x.code } - because of: if (x) { x.code }. However - fragments are always defined as argument directly. Therefore the parser can now that we need foo ..., foo(...), foo 1, ... or foo(1, ...) - that is: an identifier or a comma before BALANCED-PARENS BRACE for it to be a fragment and not a block.

It should be noted that fragment syntax without prefix is not possible - it's too ambiguous, so the minimal variant one gets away with for that is foo \{ stuff }, could possibly add a variation of ~> as foo ~{ stuff }.

Also, since not using braces is not a syntax error (like it would be in C, Javascript, etc.) a mistake will instead result in an error along the lines of: No overload for method 'bar' matches signature 'bar(Set‹Int›)'. Alternatives found are: .... However! This could be improved later on with the help of heuristic analysis (if x amount of brace-blocks has been parsed successfully up until this point, chances are good this was the intention also where the error happened), figuring out what the programmer most likely meant and producing a message according to that. If it guesses blatantly wrong one simply uses onyx --verbose-errors or something along those lines. We don't want it to turn into That Fucking Paperclip(TM) from Excel.

Side notes

Another discussion led me re-consider re-introducing the fn keyword, so it again can be optionally used for function-defs. It worked well before I removed it (though I never use it), so it will be allowed again for those who prefer a beacon in front of the func-defs. Using such a style in combination with braces undoubtedly look a lot like Rust syntactically. But the similarities ends there. While Rust is like dragging a wooden plow through lead by hand, Onyx is like cutting whipped cream with a high-power laser. (End of highly biased opinionated statement)

Motivation

What IDE or editor you use, or how you prefer the code presented in detail when you work with it, should be your choice alone. That's a strong motive in Onyx! It's the semantics of the language that is important to share in a project for effortless collaboration. Well, inevitably we'll be exposed to another style through the "public repo style", when viewing code online in github/whatever - but that's a small price to pay for individual expression where it counts: in your editor - in your hands.

For me as developer of Onyx, it's important to work hard to maintain an open mind in order to not shut out useful styles based on my personal preference (except that it actually is available in Onyx of course). I have the local-configuration for that just as everyone else!

Undecided

I'm not sure whether indentation should still be significant and enforced in brace blocks. This could likely be removed as demand within a brace-declared block - but in such a case I must examine the ramifications further for that.

if (foo == true) {
should-it-be-legal-to-do-shit-unindented-here?
}

Thoughts?

(*) "Route '66 Style" : I refer to the language that introduced the structuring that became brace style in C in 1972, namely BCPL in 1966. Plus is sounded cool.

Sod-Almighty commented 8 years ago

I think ~>{ ... } makes more sense than ~{ ... }. There is plenty of prior art for this - Ruby and Livescript, f.ex. Except....that's a proc, isn't it? Hm. Yeah, maybe ~>{ ... } for a proc and ~{ ... } for a fragment.

You have a typo in there: 'now' instead of 'know'.

Obviously nobody should be coding without indentation. Ever. However, sometimes I encounter an odd edge case that requires nonstandard indentation; which can be a massive problem in languages that are strictly indented. Therefore it makes sense to ignore indentation completely inside structural braces (but not fragment braces!)

NeoTech commented 8 years ago

I concur that that within a structural brace the indentation should not be necssary. The seperation would be the braces, anyway. And even if its good practice to indent code.. I still when i'm in a hurry code gigantic oneliners and just expect the interpretor to understand it when using curly braces. And sure i know it's bad practice. But when you need to kungfu fix something in a live enviroment that several thousand users is dependant on.. And you dont have the extra 10minutes to make it pretty.. I just prefer to whack out a one line fix and restart the services.. And make it pretty in the next push/release.

Guessing it might be just me tho. Cuz i work in the area i do.

ozra commented 8 years ago

Yes, it's kind of obvious when I think about it. There are sort of "three" different flows currently: indentation - obviously - then the asymmetric block markers: optional starters (required for one-liners), and then the optional end-keywords - which are mandatory in some other languages and completely absent in some (Python, LiveScript, etc.) - they give great freedom in marking up structure when it's clearer, and not having too when the flow is obvious. The third "flow" is parentheses. Parentheses can group any amount of expressions, even expression statements, and as such has the same properties as the braces will have. So the "flow" already exists in parsing-structures. In parentheses - since they are symmetrical (duh) indent is unimportant. Thus of course it shall, and will, be for braces too - it stays with the current syntactical patterns that already exist. This is good news. It means there are still a handful simple rules that dictate Onyx (I'm very happy with the constitution of it so far. I did not expect it to develop like it's done. It's already much better than my initial map and ideas.

@Sod-Almighty - ~> { } "is" already the "correct" wavy-verson of that syntax, so :+1:. Note though: there will not be a differentiation in syntactic rules depending on what the braces are used to specify and expression-list for: whether control-structure, func-def, fragment-def or other.

@NeoTech . I hear you - that "whip it up, and polish later" requirement of reality is also one of the reason I want Onyx to feel like coding script or pseudo code, when you do throw-ups (as I call them). And even though it feels like that thanks to the "almost global" type inference - the type system is much stronger than Go, Java and C++. All thanks to the Crystal project - goes without saying - hard work! I know from earlier experience of coding type systems. Then afterwords one add more explicit type notations where needed for creating "gates" where errors can be caught at a more decipherable shallow depth.

So, from both your remarks, and what now seems obvious to me, indentation shall not matter within braces. If one opens an indented block of code within the braces, then indentation rules again - in that scope. And vice versa braces.

For once, something all programmers that work in the real world agree on ;-). Must - make - now — must work!

I think I'll start implementation on this one rather soon. I've got a bit of more work on the compile-time/run-time type reasoning / introspection, and am fixing some edge case bugs in the macro system, I stumbled upon along the line. This will be very interesting to see marry in one syntax. I think this is the ultimate freedom of working on things together, as one self wants.

Thanks for the suggestions.

ozra commented 8 years ago

I added #98 which expands around the new changes - in an effort to tighten up some loose ends at the same time.

ozra commented 7 years ago

Reviewing this after some time now, I once again lean over more towards "parentheses are good enough for grouping code whitespace-ignorantly".

Anyone who feel inclined to a challenge is welcome to come up with as many problematic (ambiguous) syntactic situations as possible. Because there are such. Only by finding all clashes one can possibly think of, to see if all those exceptions to an unambiguously clean syntax, together, are an acceptable cost compared to the value it adds it would add to the language.

Otherwise this will more likely fall off the map for inclusion.

ozra commented 7 years ago

Pondering this, on and off, the few times I've had some moments to set a side lately, I've been swaying with regards as to not do it anyway. But then changed opinion towards going with it, yet again. But something's gotta give.

One thought lately is that literals must be prefixed with type. This is not a new syntax. Just the implicit type-instantiations of current literals would disappear.

That is: x = {foo, bar} would not be a set anymore, instead x = Set{foo, bar} would be required. Likewise: x = {foo: 1, bar: 2} must be written x = Map{foo: 1, bar: 2}, etc.

Alternatively, not requiring the explicit type (it can still be deduced from the literal as now), but requiring a prefix character saying "this is a literal", if no explicit type is used.

It's not as sweet and pretty, but the conflicts between introducing braces as code groupings vs. the current literals is a bit to hot. It would result in a lot of unnecessary, not very self-evident, errors imo.

Better ideas are highly welcome — bar ditching it. Because there really seems to be a whole bunch of people favouring this old-school structuring style.

Sod-Almighty commented 7 years ago

I still don't see the problem with the suggestions I made before:

{ ... } -- block (pod)
<|...|> -- tuple
{| ... |} -- set

-- In addition, how about:
(| ... |) -- brittle tuple

None of these should conflict with anything else, surely? (Except |> is useful for pipelining, but in that case it wouldn't follow <|, so it's easy to disambiguate.)

Maybe allow explicit Set{ ... } syntax as well.

Sod-Almighty commented 7 years ago

(Incidentally, FYI, there is prior art for brittle tuples. Lisp uses them in the form of "multiple return values".)

ozra commented 7 years ago

The multiple chars still feel rather clumsy, however it's definitely worth revisiting every pondered option again because of the new mouth to feed (brace-blocks). Hmm, on a side note, if I recall correctly it's Haskell that uses . for pipelining. If the step was made definitely regarding significant space for operators, that would mirror "method-dot" (which would then be identifier-adjacent, sans space on at least one side).

Does the lisp variant of multiple return values return only the head unless otherwise specified? (forced destructuring, and discard of tail)?

Sod-Almighty commented 7 years ago

. for pipelining

Actually, that might make sense for method-pipelining, but not free-pipelining. Ruby-style example:

def add1 n; n+1; end

puts [1,2,3] . to_f    ; method-pipelining, calls `map`, prints "[1.0, 2.0, 3.0]"
puts [1,2,3] |> add1   ; free-function-pipelining, calls named function, prints "[2,3,4]"

puts [1,2,3] |> add1 . to_f |> add1     ; combination, prints "[3.0, 4.0, 5.0]"

Incidentally, |> has prior art in a bunch of languages, for example F#.

Does the lisp variant of multiple return values return only the head unless otherwise specified? (forced destructuring, and discard of tail)?

Yup!

(defun my-function ()
  (values 1 2 3))

(format t "~a~%" (my-function))    ;; outputs "1"

Lisp is actually a really interesting language. It's godawful to learn to begin with, but fun to use once you understand it.

ozra commented 7 years ago

I was thinking along the lines of UFCS revisited, it feels bad ditching that feature after all. Then one op is enough - because methods are simply functions that's been defined in a context with elevated access privileges (member vars), but are functions just like all others use-wise, and all funcs can be used as methods. Yeah, no need to repeat the obvious...

How does map come into the picture for the iterable? What's the rule of the operator in Ruby?

I'm aware of |>, I've coded with it myself, still think it looks very clumsy - but that often comes down to the fonts of choice :-/. The dot feels a lot cleaner. And with ufcs no specials are needed at all. I think that would be the better choice after all. Must re-read the issue, to see what we thought through.

Was looong ago I coded Lisp, I realize; Yes: values instead of list — good catch! :)

I'll still need to see a bunch of practical real-world examples of where it shines, along with examples for where it would be a real con to get a realistic picture of what the effect would be, It's a very intrusive construct; introducing multiple "self destructuring" return values (as said, if seen net beneficial, it will be defined as a "return type specific qualification", not as a type "having a specific trait" of "falling apart destructured when touched". Hmm, well, unless that can be argued for as actually having benefits ofc. — this is my intuitive stand in lack of better use-cases to study)

Sod-Almighty commented 7 years ago

How does map come into the picture for the iterable? What's the rule of the operator in Ruby?

Ruby doesn't implement this functionality. I simply meant that my example was written in Ruby style, rather than Onyx style, because I'm more familiar with it.

In Onyx, . would ideally translate into a map call.

And with ufcs no specials are needed at all.

Depends what you mean by "special". UFCS doesn't imply pipelining - it would simply unify method-pipelining with free-pipelining.

If UFCS is implemented, I suggest both . and |> are supported.

Brittle return values are very useful for allowing the caller to choose what level of granularity it wants:

def download_file file
  success = ...    # true/false
  error = ...
  (| success, error |)
end

# We only care about success / failure here
puts "Download success" if download_file '...'

# We want to know what went wrong here
success, error = download_file '...'
puts "Download failed because: %s" % error unless success

It allows a function to be used in a conditional expression whilst also returning detailed information.

Real-world example:

(defun execute (command &optional (args nil))
   ...various code to execute the command...
  (values success stdout-lines stderr-lines return-code)
  )

; we only care about success
(if (execute ('some command'))
  (format t "Success~%")

; we care what the result was
(multiple-value-bind (success output) (execute ('ifconfig'))
  (if (success)
    (format t "ifconfig returned ~s~%" output)
    (format t "ifconfig failed~%")

; we want all information
(multiple-value-bind (success output errors return-code) (execute ('ifconfig'))
  (if (success)
    (format t "ifconfig returned ~s~%" output)
    (format t "ifconfig failed with error ~d, and output as follows:~%~{  ~s~%~}~%~%stderr was:~%~{  ~s~%~}~%" return-code output errors)

Imagine the horrible contortions required to implement this in Ruby or Crystal as things stand. You'd need a class, and the execute method would set various instance variables. It would return either true or false, so you could use it in a conditional, and then you'd have to interrogate its variables afterward. Meh.

ozra commented 7 years ago
ozra commented 7 years ago

Back to the issue:

Aside from the "use explicit type names in literals construction" (or wacky delimiter combos <[ ... ]> etc. [they are already implemented in Onyx since about a year]) there is another way of reducing ambiguity.

Unfortunately it means killing a darling (but you should always do that, so...): the elegance of only needing indent to signify a block of expressions.

Instead one of the nest-start-markers must be used; not only for one-liners — but for indented blocks too:

-- current de-facto style:
if foo and bar
   do-stuff
else
   something-else

-- die, die, die my darling - now one of the markers _must_ be used:
if foo and bar:
   do-stuff
else:
   something-else

Any one of the usual suspects :, =>, then, do can be used, along with now { ... }. Let's see why that sacrifice would be useful:

-- it's not `bar(Set{})` - because `{` fulfils explicit nest marker requirement
if foo and bar {
   do-stuff
}

-- it is a `Set{}` literal, because we've already nest marked
if foo and bar:
   {a, set}

-- brace-nest-block and literal set
if foo and bar {
   {a, set}
}

-- If you like C-style braces, and find set-literals confusing in combo?
x = if maybe-true { {a, set} } else { {another, set} }

-- Then write them type-speced or whatever:
x = if maybe-true { Set{a, set} } else { Set{another, set} }

-- or stick to good ole asymmetric style
x = if maybe-true ? {a, set} : {another, set}

The reduction in elegance makes:

I haven't exhausted the suggestion against all constructs yet - hopefully no gotchas arises. If it holds: this is probably the best proposal yet (no matter how bad it makes me feel to loose the smooth original indent-only style).

Sod-Almighty commented 7 years ago

I really like the indent style though. Am I missing something? Why can't we just have explicit typing and symbolic alternatives?

Map{a: 1, b: 2}   ===    {| a: 1, b: 2 |}
Set{1, 2, 3}      ===    <[ 1, 2, 3 ]>

(or whatever the symbols were supposed to be, I can't recall offhand)

ozra commented 7 years ago

Different approaches, different sacrifices.

Sod-Almighty commented 7 years ago

What precisely does "route 66 style" mean, anyway?

ozra commented 7 years ago

Last line of this issue's OP:

(*) "Route '66 Style" : I refer to the language that introduced the structuring that became brace style in C in 1972, namely BCPL in 1966. Plus is sounded cool.