Basic literals syntax - Githubissues

jjl commented 7 years ago

LFE currently has a lot of syntax which makes it seem like a big language. I think there's a certain amount you expect to be able to do and a little bit where we can trim some fat.

Remove redundancy

It's simpler and more homoiconic to only have one way to represent things, so I think these deserve the chop:

#*0 #b*10101 #*-1100 ; alternative binary forms, less mnemonic than #b
#d1234 #d-123 #d0 ; alternative decimal forms, needless and more to remember
#xc0ffee ; alternative hex form, everyone is used to 0x

I hope that was nice and easy and relatively uncontroversial, because up next are some more extensive changes...

More intuitive input formats

Everyone knows 0x is hex. I propose we adopt 0b and 0o for binary and octal inputs (which are increasingly rare anyway). An arbitrary radix could be specified by 0r36:123 or similar. Numbers beginning with 0 and not followed by an alphabetic character will be considered excessively zero-padded (for readability?) decimals. I'd also like to input them with underscores wherever I like to improve readability. In fact, I think when numbers are output in decimal form, we should aim to put underscores in them so the user can immediately read them.

Characters I think should be moved from #\ to \ and we should be able to access the common escape characters through \\, e.g. \\n translates to the codepoint for the newline character. \\u{PILE OF POO} should just work. \ should be forbidden from symbols without ||-quoting and anything other than this with the \ prefix is an error.

I would like """ (three double quotes) as an alternative delimiter for a string, but that may contain double-quote characters as in python. It's really really useful when writing a docstring, associates well with writing a string because the delimiters are similar and only requires a single character lookahead.

I would like to push people off of using list strings, so I think the default string type should be binary strings and should use the "" delimiters everyone is familiar with. At present I want to reuse the #"" syntax for a regex input mode (like clojure, although it would just wind up as a binary string). Since we have no other fundamental datatypes, I am considering #{} for list strings. The braces do not however look like strings - perhaps these would be better off for the regex literals?

I do not have strong opinions about the binaries syntax, because I don't really use them. I can see it might be nice to use #{} for them, but that conflicts with list strings/regexes. We could possibly use #<> in a pinch, but it feels like too much syntax. I think I'd rather just remove them in favour of using a macro (which can take a binary string) or using #.(binary_to_list "foo")

jjl commented 7 years ago

Most of these are implemented in the mostly-implemented reader i threw together today. it's all nice and isolated!

https://github.com/jjl/lfe/compare/develop...jjl:new-reader

yurrriq commented 7 years ago

Ah, doing tuples with [] will break LOTS of code for me :/

jjl commented 7 years ago

I think the scale of the syntax changes we're talking about is going to break all of your existing code. Unless we went down the two frontends route I suggested, which gives you both working in parallel until you have a chance to port anything you're attached to, that's just something that will happen from a breaking change.

jjl commented 7 years ago

I would like more input in the following areas related to basic syntax:

atom syntax (of the doesn't-need-a-matching-terminator kind) - you were suggesting using :?
regex input mode (current plan: \ can only escape the terminator, currently going for ##"", not final by any means)
liststrings (current plan: #"", again, not final)

jjl commented 7 years ago

Okay, I've implemented most of the above. I've made the following decisions, at least for now:

'bare' atoms can contain ':' as a character, including for the start character. Actually, they can contain any character that isn't reserved for something else and isn't a space or control character (actually, unicode space counts as valid, which we should fix, we only detect ASCII. Yay!).
List strings are on "" as one would expect because the plan to minimise them was scuppered by chats about the erlang libraries.
Binary strings are on #""
Regex strings (suppress all escapes except \") is on ##""

The following are not yet implemented:

Floats
Negative numbers
Exponents
Named unicode escapes

jjl / lfe

Basic literals syntax #4

Remove redundancy

More intuitive input formats