Step Out of the '90s and Support Unicode Glyphs Aliases for Core Functionality

ozra commented 8 years ago

Some examples brought up by @stugol: λ, ≠, ≤ and ≥.

This is a definite go!

The "grander idea" I've held since the beginning for Onyx is to:

Support a smooth way of working on a repo "checked out" in one style - your style - less compromises, and
Still being able to commit code in another "more conventional, project and collaboration friendly" style that works in vim/less/github-diffs/whatever.

Exactly how, using hooks etc. is not on the table atm, but it will be solvable in clean fashion, I'm sure.

The main tool to enable this is onyx configurable formatter, which slowly is progressing parallel to other work on onyx.

Going crazy, one could even support my-array₄₇, for indexing, etc. But it's good to draw a line somewhere earlier I think.

[ed: added after OP]

Some Handy "Rules" For This Issue

Suggestions for Unicode-glyphs should be made first after they've been tested in several fonts and editors, and preferably in different O/S's. Rendering varies wildly between setups. Many look like shit in my web browser. The primary use for Unicode-styling should be private editing, and not repo-style - so on github/etc. the ascii-version of your codebase is the recommended target.

stugol commented 8 years ago

my-array₄₇

That's an intriguing idea. I like it. I don't know how I'd type such characters on my keyboard, but still.

If we're supporting user-configurable syntax, then I can have { ... } blocks! ;)

ozra commented 8 years ago

Style - not syntax. And "less compromises", not "no compromises" ;-)

Such esoteric syntax as my-array₄₇ would likely not normally be typed. Rather you'd type swiftly with the most easily used wordings and add a hook to the stylizer/formatter from you editor of choice to re-format on save (and granted your editor reloads on changes it just updates to the fancy syntax - voila). If you save as often as I, it will be pretty much instantaneous B-)

stugol commented 8 years ago

Style - not syntax. And "less compromises", not "no compromises" ;-)

Meanie! :(

ozra commented 8 years ago

Don't worry, it's still not dismissed. It's an important area of syntax, so I'd really like to weigh a lot of options before accepting a literal-looking expression-grouper...

Getting back to the issue at hand (which of course touches upon above in a way).

As I've mentioned the idea for Onyx is that you work with checked out code in your own private style, have another (preferably ascii-only core-constructs) for repo-style. This should be simply done with some git-hooks and tools in the compiler. If someone is using another CVS, or none, it's up to them to PR to the tool-set or make their own solutions. Git is pretty universal these days, so supporting that is prio one. I for one don't even make a one-off script without git'ing it locally at least.

Given that as a recommended way of working, and that "inputting code" is not the same as the style of your code (you type the simplest style, say parentheses for tuples, and on save you get a more distinct style updated).

This means that, for tuples for instance, the preferred way for ascii-input/styling could very well be the parentheses. The con with it is that they're harder to pin point, but: ctrl+s and then:

-- some possible different unicode bracings just for show:
-- with tuple2* to showcase difference to regular arg/grouping parentheses:
unity(x) -> x

tuple1c = 〈13, 32, 47, 2〉
tuple2c = unity(〈"foo", 1, {1, 2, 3}〉)

tuple1d = ‹13, 32, 47, 2›
tuple2d = unity(‹"foo", 1, {1, 2, 3}›)

tuple1e = «13, 32, 47, 2»
tuple2e = unity(«"foo", 1, {1, 2, 3}»)

tuple1f = （13, 32, 47, 2）
tuple2f = unity(（"foo", 1, {1, 2, 3}）)

tuple1g = ⦅13, 32, 47, 2⦆
tuple2g = unity(⦅"foo", 1, {1, 2, 3}⦆)

This means that more unicode points will be exempted from literal use (which is no biggie, since most people don't even seem to use unicode's in literals, and iff they do, obvious delimiter chars like the above should not be used anyway).

So, for instance Tuple-literal (#57), could have (elements, here) (ascii, thereby archaic term-tool friendly, and swift to input on most keyboards), and 〈elements, here〉 (clearer separation, only gotcha is your editor must support unicode (which doesn't? switch!) and obviously the font of choice must support it (here it varies a bit more, I use "Andale Mono" personally, it renders beautifully even as non-antialiased (I prefer a nice jagged sharp look when on dark background and small screens [often work from laptop], anti-aliased in those situations is just blurry).

Let's put that in context:

unity(x) -> x

a = unity 〈elements, here < there〉
b = unity 〈one-tup〉

No confusion there! :-) And no need for trailing commas with the unicode-enclosers.

Inputting code to the program != reading style of the code. Type easy. Read easy. :-)

Elastic Tab Stops

Another interesting point is "elastic tabstops" ("real tabstops") (https://github.com/crystal-lang/crystal/issues/1682). Me personally I prefer mono-spaced fonts by far (even when printing sometimes!), however on a certain device, and of course for some people, the choice might very well be reversed. Automatic justification of groups of statements/expressions to a "tabular" look is very good imo (I like that for instance-var declarations), and that works fine for monospaced, but for someone using variable-width fonts elastic tabstops are a must (note: in below example the justified columns spaces are followed by chr(11) "vtab", just to get a symbol showing where chr(9) "htab" would be put (or whatever, read below discussion):

-- note below is not current Onyx syntax. Below uses literal-suffixes
-- which is still on the drawing board, will post issue on it soon. 
-- Also shown is "autocreated init with all named & defaulted inst-vars
-- as possible name parameters", which is also just a fresh idea.
type Barrier
    @material   = Materials.Water  'get
    @w          = 10mm             'get
    @h          = 10mm             'get
    @d          = 10mm             'get
    @k          = 0K               'get
    @foo        = "foo"
    @bar                           'get

    init(@w, @h, @d, @material) ->
        @k = @material.k * (1mm * 1mm) -- in k / mm2 instead of / m2

    init(@...) ->

end Barrier

Support for elastic tabstops in editors is probably not grand, and would be dependent on plugins. What codepoints they need is unknown to me, my idea was that justification is done with spaces and a tab-character is put at the end of those. Then it's always justified in space-editing-mode, but "\s+\t" can be interpreted as "elastic tab stop" by a plugin. These things are beyond what I will implement, but generating codepoints to support it is no problem and will just be a conf away for "private style" via the "stylizer" tool in the compiler (not done, mind you, I'm working on it!).

stugol commented 8 years ago

I see what you're getting at. And provided the end-user can choose his own representation, it should be quite handy, if a little redundant. But being forced to use e.g. 〈 ... 〉 would be horrible.

The font used here on this forum (whatever it might be), doesn't handle that glyph very well. There are extra spaces to the sides, and the glyphs are displayed too far up. It's ugly.

Andale Mono

Just tried that, and it hurt my eyes to read code with it in my editor. I code with consolas, generally.

So, for instance Tuple-literal (#57), could have (elements, here) (ascii, thereby archaic term-tool friendly, and swift to input on most keyboards)

But people might not be using the style tool, and simply editing the code as it literally is. In which case, parens are a really bad idea. Editors won't style them properly, and the eye won't pick them out as tuples. <[ ]> is clearly a better choice. If people prefer a different symbol, they can pick one and use the style tool.

Would the Onyx compiler incorporate the style tool's parser? As in, could I choose my own delimiters and compile styled code directly?

I don't really see the need for different styles. Personally I'd just implement things like my-array₄₇ and ≠, and if people didn't want to use them, they could simply use the ASCII alternatives.

ozra commented 8 years ago

I see what you're getting at. And provided the end-user can choose his own representation, it should be quite handy, if a little redundant. But being forced to use e.g. 〈 ... 〉 would be horrible.

Yes, first off: those chars was just an example, I just copied and pasted one of them. As a parentheses: on my screen (Ubuntu 15.10, non-Unity desk-env [special brew...] and Chrome) it looks perfect. So these are the things to consider for most appropriate codepoints, to make the alternative look good in as many places as possible, but then, it should not end up on github but stay in the private editing mode, since these issues will arise on different devices, probably for a forseable future - hence the "stick with ascii for repo/public/community level style"-motto.

Andale...

Once again, the variations between machines and people :-)

<[tup]>

That's a good argument. Since ASCII will be the universal conveying style of the code (github, diffing-tools, etc.), it's good if a clear separating notation is used, and all I will see on my screen, day to day coding, is WHATEVER tuple, elements, here WHATEVER.

Taking all these different new ideas in to context holistically really changes the perspective on choices. Good.

Would the Onyx compiler incorporate the style tool's parser?

Yes, all the available different styles are equally worth to Onyx and parsed the same, you can enter codepoints manually / conf your keyboard via xmodmap/whatever-in-mac-etc (my kbd-layout for instance has lots of symbols hand-confed for easy access via xmodmap, and I would likely add whatever delimiters would be used for literals as direct-access chars).

As in, could I choose my own delimiters and compile styled code directly?

No, one cannot choose any delimiters freely, you can choose via conf from a predefined set of alternatives, otherwise they cannot be parsed reliably. People must be able to compile the code without any knowledge of your styling conf. "It's just Onyx code".

I don't really see the need for different styles...

my-array₄₇, my-array.47 and my-array[47] are different styles. What do you mean?

...if people didn't want to use them, they could simply use the ASCII alternatives.

Yes, that's one style in itself.

stugol commented 8 years ago

Ah, I see. It's not to change what features you use, it's to change what features you see other people using. A bit like the C++ formatting problem:

void main() {
   if (true) {
      // correct
   }
}

void main()
{
   if (false)
   {
      // a vile abomination, for which the author should be flayed alive
   }
}

So a bunch of syntaxes will be supported (a.47, a[47], a₄₇, <[ tuple ]>, (tuple), <tuple>) but the coder gets to pretend some of them don't exist? ;)

ozra commented 8 years ago

Pretty much sums it up :-)

Someone uses this_is_my_awesome_identifier - you see this-is-my-awesome-identifier - the compiler sees the exact same thing - it understands both of you :-)

Soon there will be world-peace, haha.

stugol commented 8 years ago

What if someone creates an identifier like override_to-s? Maybe the tool should ignore mixed styles like that, and pass them unchanged.

ozra commented 8 years ago

The confability of the tool can be refined step by step as cases and needs show up, I'll begin with a reasonable basic set.

Options of identifiers will initially be "original" (leaving it be), "dash", "endash", "snake", for instance.

Later on context-based choices should be added. But I need to get the first alpha done to begin with ;-) I'm currently on a type/name-mangling mission in the compiler, hope to be done soon with that.

This could be "when in type-def: ...", "if-mixed-idfr: #original", etc. but, as said, basics first :-) I used astyle, etc. for C++ a lot, that's where the inspiration comes from.

I want to make a web-tool using it too (see #4) where confs can be gradually randomized/converged with a genetic algo based on left - right alternative choices, and used for research purposes for Onyx further development (as well as "discovering" a favourite styling for yourself). Config then can of course be downloaded.

Side-note, there will be no semantic difference between override_to-s and for instance override-to_s in the eyes of the compiler.

stugol commented 8 years ago

there will be no semantic difference

Understood. But changing it visually may not be desirable. If the identifier contains both _ and -, presumably this is intentional and serves a purpose.

ozra commented 8 years ago

Regarding: 〈tup, delims〉

The font used here on this forum (whatever it might be), doesn't handle that glyph very well. There are extra spaces to the sides, and the glyphs are displayed too far up. It's ugly.

Could you take a screenshot of that? Curious. I tried it in lots of editors, with different fonts. Would be interesting to see what it looks like on your screen.

stugol commented 8 years ago

Sure: screenshot Note the extra spacing to the sides.

ozra commented 8 years ago

Aha, I saw that in Vim also.

ozra / onyx-lang

Step Out of the '90s and Support Unicode Glyphs Aliases for Core Functionality #69

Some Handy "Rules" For This Issue

Elastic Tab Stops