Unicode dashes as "lisp'ish" alternative to hump and snake notation

ozra commented 9 years ago

I use this-is-an-identifier all the time in my code-bases (both in LiveScript for JS, and my home brew C++ transpiler). Seeing that Nim has such a magical feature to keep coders happy with thisIsAnIdentifer and this_is_an_identifer unification (world peace! At last!), it's a perfect candidate for introducing dash-delimiting.

The problem some have with it is the ambiguity regarding identifier-dash vs subtraction operator. For us who use it daily, that's a no-brainer - operators are always spaced. In Nim, this can be solved even better: a subtraction op is a subtraction op, the allowed dash in identifiers is instead a specific Unicode rune.

I'm working on this right now, so I do hope it can be a accepted PR when I get it working. Otherwise I'll have to add a transpilation stage, or patch Nim all the time, shudders..

This would cater to lispers, cobolers (if they're not dead), livescripters, htmlers, cssers, url-sluggers, etc. (OK the list got a bit contrived at the end, I'll give you that!)

Personally I can't live with out it.

I'd like some input on the rune to reserve for the purpose. The candidates so far are: ⋯ and —

both make it very clear they're not subtraction signs
both tie the identifier tightly together, even in non-spaced juxtaposition to subtraction op
both are available in all popular mono spaced fonts I've tried in the editor

The only downside is reserving one Unicode-rune to be 'magical' in identifiers like _ and capitals, instead of verbatim, which I find perfectly reasonable. The way I see it, it's the only missing magic in the identifier-delimiter wars...

Thoughts?

Araq commented 9 years ago

@ozra

I also think the U2011 is the best route - even though my fave font misses it.

Er ... so you will fix your font anyway?! What's the point of this whole excercise then?

ozra commented 9 years ago

Hmm, I think I fixed that in push no 2.. I'll look.
One has to enable issues manually for repos? (sorry, I'll google..)

Yeah, that dash-tryout was a good 'accident', it is growing on me too :)

ozra commented 9 years ago

@Araq - I use a lot of different fonts while editing, I often edit in 6-8px text sizes (my eyes are weirdly enough still good at my age despite constant staring in to screens and books) and there's a big difference in rendering quality between fonts in non-antialiased small sizes. I prefer tight row-heights. I often use Andale Mono in that situation. DejaVu Sans Mono looks almost the same, and is even better at those small sizes. Now, the en-dash (U+2013) does work in most all fonts, so it's not an issue. I don't quite get what you mean? Changing the font doesn't solve the problem. If you're referring to some hack obfuscating underscores, that is, as I've mentioned, not an option to me.

It seems now like U+2013 is a very good road to take for the magic unification, since it will likely be preferred by 'dashers' and actually be used, instead of non-magic runes, and thus yield the much wanted bonus of inter-module "transparency" - which is the point of the exercise.

And a good selling point for Nim: ("Dear Dudes and Dudettes: Nim supports all the three main styles of identifier separators - transparently interchangeable! Enjoy the language of the future without compromises and/or selling your soul!") ;-)

Araq commented 9 years ago

If you're referring to some hack obfuscating underscores, that is, as I've mentioned, not an option to me.

You never explained why though.

And a good selling point for Nim

in a couple of years perhaps. Right now everybody is full on the "only fascism works for big projects" track.

ozra commented 9 years ago

When it's time to diff with meld (or whatever..), view it on github, or make an edit in vim via ssh on a remote machine - the code will schizophrenically look all different. I want to see what I get.

I'm OK with other people having their preferred code styles - [ ... ] What I'll be working with day in and out is mostly my own code. Then I want it to be consistent and that what's seen is what's [ed:] actually there.

ozra commented 9 years ago

Regarding the 'transparency' thing, it's something I'm used to:

In LS 'this-identifier' is equivalent to 'thisIdentifier' (javascript de-facto).
In SugarCpp 'this-identifier' is eq. to 'this_identifier' (c de-facto).
In my own, purposely much thinner C++11 transpiler, 'this-identifier' renders to either hump or snake - if it can infer it with it's limited intelligence, else defaults to 'this_identifer'. Very similar to the way Nim solves it, which is further why I'm attracted to Nim - it does a lot of things the way I want it to, and have previously hacked things to work. The dash is the major missing feature for me. I'd like to leave the hacks behind.

ozra commented 9 years ago

Note: It could easily be 'officially demanded' that, for modules to get accepted in to stdlib, they have to be hump-cased, and for Nimble, it could be a 'strong suggestion' that published modules follow this or that syntactic style standard. Still without robbing end-users of the possibility.

ozra commented 9 years ago

Araq: You won't regret accepting the feature, I promise :)

Should I add unit-tests, or is it overkill for this?

I guess I'll just PR, and take the questions from there? (I'm a bit new to the github work flow)

ozra commented 9 years ago

PR https://github.com/Araq/Nim/pull/2833

josephwecker commented 9 years ago

@Araq fwiw, relevant discussion on hacker-news from a few weeks ago regarding earl-grey's decision to add support for dashes in identifiers. In my mind it's simple- underscores and camelCase are hacks from when subtraction was more common than multi-word (or even multi-letter) identifiers. Programmers and language designers use them out of habit, which is fine, but never has anyone (for example) asked css or lisp to please use underscores or camelCase instead of dashes. It's one of those things that once you try it you never want to go back.

refi64 commented 9 years ago

but never has anyone (for example) asked css or lisp to please use underscores or camelCase instead of dashes.

I use dashes in Lisp because the parenthesis are enough visual noise. I use underscores in CSS. Maybe I'm weird, it's not a big deal.

ozra commented 9 years ago

@josephwecker - Earl-grey, interesting, may be a contender for LS for my future JS requirements.. @kirbyfan64 - that means you're perfectly normally unique. As we all are. Though the underscores in css was a bit weird ;-) YMMV.

dom96 commented 9 years ago

The reason that dashes in names are ok in Lisp and CSS is because prefix notation is used in Lisp and CSS does not allow much arithmetic (if at all).

ozra commented 9 years ago

The arithmetic argument is kind of moot. In most programming, subtraction is fairly uncommon, unlike multi-word-identifiers. Personally, I do work almost exclusively with math intensive code with lots of arithmetic in it. And then that code is with subtraction-symbol-as-dash syntax. And still no problems.

Regarding the argument, here's a far more, actually, recurring pattern:

for foo in a.low..a.high: doshit()

josephwecker commented 9 years ago

@dom96 - kind of. It's actually the degree to which a language does or does not enforce whitespace as a delimiter. Lisp has always delimited operators with spaces (although it didn't have to)- infix often allows you to skip it, although most encourage delimiting operators with whitespace anyway. But sometimes the optimization that comes from allowing the character in an identifier outweighs the optimization that comes from being able to use the operator without spaces. For instance, in ruby - some_method? is a proper identifier, while some_method ? is the first part of a ternary statement. Ternary statements are more rare than methods that return true/false, therefore it made sense to allow some_method? as an optimization for typing and readability over is_some_method and force the ? operator to be delimited (in front at least) with a whitespace.

There are plenty of languages (more all the time) that support dashes with infix notation, (older traditional examples are COBOL and Forth) (and many languages have wrappers that convert them to underscore for those languages that don't support them). For example, two that I happen to be working with for a project recently are livescript and just the other day I encountered this one, for example:

screen shot 2015-05-28 at 11 55 38 am

Anyway, I could go on about it :-) Most people slowly convert to it- they realize how much more readable their file-names and git branch names are when they use hyphens instead of underscores or camelCase- etc. Then they realize that the only reason languages don't allow them is so they don't have to force spaces around a subtraction operator and suddenly it seems a little silly :-)

BTW- thanks everyone for being really civil about this topic and not letting it turn into a flamewar! I think it bodes very well for the culture that surrounds the language.

ozra commented 9 years ago

The awareness phase of change: First it's resisted, then it's tolerated, then it's accepted, then it's adopted. ;-)

ozra commented 9 years ago

@josephwecker - I finally looked at the call highlighting I though I had fixed. I had - but the 'compiled' xml def was not in the repo pushed, only the yaml source. So now it should be upstream also, as soon as the PR is pulled.

ozra commented 9 years ago

PR #2849

oprypin commented 9 years ago

I am strongly against this.

Sure, Unicode is allowed in identifiers, but nowhere in Nim does it mean anything special. This would be a start of a lot of horrible Unicode requests.

Confusion would ensue:

Symbol	Name	Used meaning	Real meaning
-	HYPHEN-MINUS	minus	hyphen (join words) or minus
–	EN DASH	hyphen (suggested)	span or range
−	MINUS SIGN		minus

reactormonk commented 9 years ago

@BlaXpirit sounds like we need reasonable syntax HL then.

nim-lang / Nim

Unicode dashes as "lisp'ish" alternative to hump and snake notation #2811