ozra / onyx-lang

The Onyx Programming Language
Other
97 stars 5 forks source link

Basic Lexical Elements and Value Literals of Onyx #9

Open ozra opened 8 years ago

ozra commented 8 years ago

Note this issue only goes into the surface of constructs: the lexical aspects, for type-definition, etc. there are separate "Doc / RFC"-issues.

There are some [RFC] markers in this text, those are for lexical elements that are very much up to debate. You can question any of them. But those are ones in need.

Identifiers

Variable and Function Identifiers

my–identifer = "47"  -- here using ENDASH in the identifier

if my_identifer == my–identifer => say "Yep - snake case is interchangeable!"
if my-identifer == my–identifer => say "Yep - hyphens (lisp case) also!"
if myIdentifer == my–identifer => say "Yep - even camelCase works!"

my-fun-with-qmark?(foo) ->
   foo == "Say what?"

bar = my-fun-with-qmark?("Say what?")  -- => true

Internally the separators are all represented the same way and therefore comparable.

type MyType
   some–member Int32 = 47
end

my-type-instance = MyType()

Type names are always initial capital

Pros

This idea has followed my language design ideas for about a year now. When I stumbled upon Crystal, I saw it used the scheme also. Click. As Crystal now is in the family from AST-level down to LLVM - this is set in stone.

This, along with constants, also form the notion "capital initial letter = compile time fixed symbol".

Constants

MyConstant = 47

type Foo
   MyFooConstant = 47
end

Constants aren't "dangerous", they're the "safest part" in code, so why should they have a "shout out look"? Well, they have a more "formal" importance. If you see code compare x to thingie. What is thingie now? But if x is compared to Thingie, you now that Thingie is a formalized important concept. So it does hold higher system wide importance. I believe this justifies capitalization apart from it's status as compile time constant, which in that regard is a less important factor. In addition it helps speeding up compilation time.

If you're hell bent on having some constant lower case, you could wrap it in a function - compiled in release mode this will be the exact same machine code and exact same speed:

CRUDE_PI = 3.14
crude-pi() -> CRUDE_PI

say "Hey, my lowercase 'constant': {crude-pi * 2}"

Global Variables

-- currently:
$my-global = 47
$my-thread-bound-global = 42 'thread-bound

You simply use dots: SomeModule.SomeType.a-func()

Literal Values

Numbers

my-int-number = 47
my-real-number = 3.1415
my-hex-int = 0x2f

my-literal-typed-int = 47u64

-- likely future idiomatic way (no longer typed at literal)
-- my-literal-typed-int U64 = 47

my-big-number = 1_000_000_000  -- underscores can be used to clarify

[RFC] The literal typings will be removed. Currently a literal int is typed StdInt by default, and then if assigned to a var that is typed as, say UInt8, it fails because of type mismatch - which is ridiculous from a human being's perspective. The type inference will be improved for this - just have to figure out the "right way" to implement it conceptually.

The data type is StdInt* for integer literals by default. The data type is Real for real literals by default.

(*) Note StdInt will be changed to be called simply Int, provided coordination with Crystal team holds.

The data type used for the literals can be changed, either explicitly as above, or through parse-pragma: 'int-literal=BigInt - this would cause any literal integers to produce BigInts instead. 'real-literal=FixedPoint[4] - you get the picture.

The variables in the above examples are inferred to the type of the literal - they're not dynamic.

Tags (aka Symbols)

my-tag = #some-tag

my-fun(foo Tag) ->  -- note, you don't have to specify the type - inferred!
   case foo
   when #some-tag    => say "It was some tag"
   when #other-tag   => say "It was other tag"
   else              => say "It was {a} - which I don't recognise"

my-fun #funky-tag  --> "It was funky_tag - which I don't recognise"

Tag (think "hash-tags"...) are unique program-wide, they get a unique Int32 number internally, and so are very efficient. Preferably you should use enums, but in some cases, just having ad hoc tags is very convenient. As easy as using strings as identifying tokens; but with the performance of an Int32.

Strings

my-string = "A simple string"
my-interpolated-string = "Interpolation: {my-string} with sugar on top!"
-- any kind of expressions can go in the interpolation brackets of course!

the–str = "111kjhgkjh" \
   "222dfghdfhgd"

--> "111kjhgkjh222dfghdfhgd"

yet-a–str = "111kjhgkjh
   222dfghdfhgd
   333asdfdf
"
-- above preserves the white space and newlines

my-straight-str = %s<no {interpolation\t here}\n\tOk!>
-- for the %* string notations, you can pick your delimiter chars yourself,
-- which ever makes the particular string clearer: `<...>`, `(...)`, `{...}`
-- or `[...]`:

The data type is Str / String

Chars

my-char = _"X"

[RFC] Chars are no way near common enough to warrant wasting a unique symbol on (like single quote for instance, which has several other, more important, functions in Onyx).

Was first c"X", but then changed to %c"X", which follows the pattern of the other "special string literal notations", but I decided to at least give chars some special treatment, going `%"X", but after some use, it looks noisy, so tests underscore variant now.

Regular Expressions

my-regex = /^match-stuff.*$/i
match = my-regex =~ "does this match?"

The =~ above is of course a generic operator that can be implemented for other purposes for other types.

A consideration could be to change the syntax to prefixed-string, like Char:

my-regex = r"^match-stuff.*$"i
match = my-regex ~= "does this match?"

However, in much network programming, which is quite common, regexes serve a steady role, do explicit sugar syntax for them seems warranted.

The resulting type is Regex.

List - a dynamically growing (Vector, Array, Seq, Sequence, etc. in other languages)

my-list = [items, go, here]
other-list = [
    "a string"      -- commas not necessary if newlined
    47,             -- but are allowed
    1.41
    ["nested list", "ok, duh!"]
]

-- type of above is List< Str | StdInt | Real | List[Str] >

an-empty-list = [] of Int   -- empty list has to be typed (since there are no
                            -- values to infer type from)
another-empty-list = List[Int]()  -- same result as previous line

For details on List vs Array see issue on basic data types: #***XXX.

You can make Listish literals with arbitrary type also, see Set for notation.

As is obvious by now: the resulting type is List, where T can be a sum type.

Tuple

my–tuple = {"foo", 1, #bar}

[RFC] It is desirable to use (items, here) notation for tuples, because braces are never used for tuples in mathematical notation. It does however make syntax a lot messier, since both expression grouping and lambda-parameters use parentheses. The current tuple notation would be much better of used for set-notation!

Set

my–set = Set{"foo", 1, #bar}

Any type can be used as prefix as long as it implements the [](ix) method, this is therefor a generic "listish" syntax. [RFC] Set unfortunately doesn't have it's own literal for now (compare tuple above).

Map - Hash Map

string-keyed–hash = {"foo": 47, "bar": "bar value", "qwo": ["a", "list", "here"]}
tag-keyed-hash = {
    #foo: 47    -- commas not necessary when newlined
    #bar: "bar value"
    #qwo": [
        "a", "list"
        "here"
    ]
}

string-keyed-hash-js-style = {
  var_name: "a value"
}

some-var = #a-key
other-var = "another key"

variable-keys = {
  some_var => "some value"
  other_var => "other value"
}

-- type of above would be {Str|Tag => Str}

[RFC] Note, I will change the syntax for: [ed: this is changed now / 2016-03-25] {key_here: value_here} - it currently parses it the same as key => val notation. I will change this to follow Javascript JSON variation: key_here considered a literal string. This might facilitate network coding working with JSON's a lot, since you've then essentially got JSON-syntax in Onyx (but strongly typed!).

I've probably forgot something, just tell me.

stugol commented 8 years ago

You mention "camelCase" and "humpNotation". Are they different?

I'm in favour of nil-handling sugar; but I'm not sure how well it'll interact with the ? method suffix. Maybe require ?? if the method ends with ?. fn?.fn is preferable to fn?fn, in my opinion.

I'm in favour of dashed identifiers, and optional commas in arrays. I suggest both regex syntaxes.

Implicit string literals in hashes is an interesting idea.

ozra commented 8 years ago

@stugol

  1. Ah, no, I'm used to calling it humpNotation, but saw that camelCase is more common, so I've tried sticking to that term instead, but - slip of habit.
  2. Regarding nil-handlind sugar, it's good to discuss in #21 :-)
  3. Both regexp-notations might be worth considering, don't know what value it would add, but it would be easy to implement.
stugol commented 8 years ago

I notice %s{ ... } is non-interpolating. Is %{ ... } interpolating?

ozra commented 8 years ago

Yes that's right: %(...) etc., is for using other delimiters for the string, as in Crystal. %s(...) etc., is flagged "straight string" - no interpolation.

stugol commented 8 years ago

Good. Asterite refused to implement non-interpolated strings. Sigh.

ozra commented 8 years ago

As of today: Char-syntax changed from %"X" to _"X".

Sod-Almighty commented 8 years ago

...why?

ozra commented 7 years ago

Looking through code, it simply looked veeery noisy with the percentages. I must admit, it wasn't very thought through.

I think, I'll follow the motto used up till now of expanding choices first, and then reducing options to what becomes preferred, only after some time of side by side usage. So I'll re-introduce the old syntax again, for evaluation, and to continue the de facto devlopment methodology of Onyx.

ozra commented 7 years ago

As of now: Char-syntax %"X" re-introduced. Both are now available in order to evaluate and compare.

ozra commented 7 years ago

Hmm, I still want to allow 0 - n prime symbols at the end of identifiers (have wanted that since the beginning, but put it of again and again because of fear of confusion, but I think it's very moot!): Hmm, better put this in it's own issue first.

ozra commented 7 years ago

For previous comment: #95. Link here in regard to Lexical and Literal aspects of lang.