koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 151 forks source link

Collection / Vector Literals #527

Open TimWhiting opened 1 month ago

TimWhiting commented 1 month ago

It would be good to support vector literals and add special support to the backends to lower them directly to the runtime representation.

As far as syntax, I think directly prefixed literals would be good to support: (This is already supported for raw strings), and could be useful for other literal forms as well.

vec[0, 1, 2]

Then the idea would be to add the following definition

pub extern vec/from-list-literal(l: list<a>): vector<a>

Which could allow other prefixed identifiers (even if they are not externs or specially compiled):

pub fun set/from-list-literal(l: list<a>): set<a>
...

set[0, 1, 2]

Unfortunately this clashes with index and assignment notation.

Are there other characters that could be used, or should it be distinguished by no local variable by that name, or what?

Additionally literal map notation would ideally be able to use the typical 0: "" notation instead of tupled arguments. Not sure what the best design is. Just wanted to get a conversation started.

For a similar prefix idea for strings and string interpolation see #528

Alternatively for this we could use a similar desugaring as the string interpolation proposal: i.e. something like

fun vec/start-list-literal(size: int): vector-builder<a>
fun vec/add-list-literal-item(vb: vector-builder<a>, item: a): vector-builder<a>
fun vec/finish-list-literal(vb: vector-builder<a>): vector

This could then allow for 'spreading' of various collections into others.

vec[0, 1, 2, ...[3, 4, 5], ...set[2,2,2]]

fun list/vec/add-list-literal-items(vb: vector-builder<a>, l: list<a>): vector-builder<a>
...
fun set/vec/add-list-literal-items(vb: vector-builder<a>, s: set<a>): vector-builder<a>
...

Of course the preallocated 'size' parameter of the start function then becomes less useful when spreading unknown sized collections, but they could be computed.

vec[0, 1, 2, ...[3, 4, 5], ...set[2,2,2]]
// desugar to
val a0 = [3,4,5]
val a1 = set[2,2,2]
vec/start-list-literal(a0.size + a1.size + 3)
   .vec/add-list-literal-item(0).vec/add-list-literal-item(1).vec/add-list-literal-item(2)
   .vec/add-list-literal-items(a0).vec/add-list-literal-items(a1).vec/finish-list-literal()
chtenb commented 1 month ago

I like the idea of using prefixes for literals. Many other languages use different delimiters depending on the datatype ({ for sets, [ for lists, " for strings), but you quickly run out of possible delimiters. Using prefixes seems like a more future proof approach.

Unfortunately this clashes with index and assignment notation.

Perhaps { is a possible alternative? vec{0, 1, 2}. I suppose the downside is that we already have list literals using [.

Another thought, currently the [ syntax without prefix is reserved for list literals, but one could also consider that a literal without prefix should be resolved through type inference.

TimWhiting commented 1 month ago

Ideally we would desugar during parsing. Or at the very least statically prior to typechecking. One issue with your proposed vec{0, 1, 2} is that it is not obvious that it doesn't desugar to vec(fn() 0, 1, 2) with koka's anonymous functions: obviously with commas that particular desugaring doesn't make sense, but what about a single element vector?

Alternative delimiters could be vec|0, 1, 2| but that can maybe clash with ors especially for a single element vec | 0 | , and <> has the same problem for less than. It seems like this is a solvable problem, but not particularly fun to solve.

I agree that we could have the [] no prefix also be resolved in the same way, but searching for a definition of no-prefix/start-list-literal we'd have a default implementation for lists that gets inlined in the backends, but we could also add hiding of imported names which would allow you to hide the one for the core list version. Might look a bit strange and would be kind of strange to someone coming to your module and seeing something that looks like a core list and isn't.

Koka already has some static determination prior to typechecking in terms of operator precedence, and allows you to create definitions such as pub infixr 30 | or something. What if we also added something like pub list-prefix vec, which registers vec as a list prefix and then warns the user if they are using a local variable named vec in an assignment or index operator.

Alternatively some languages use a dual brace system like: vec[|0, 1, 2|]. This is most clear, but also makes it a bit noisy. But it would allow to also distinguish something like set{|2, 2, 2|} The combination characters [|, (| and {| could be resolved as early as lexing, and include the identifier as part of lexing as well. Clear win in terms of implementation effort, the question is whether it is a nice user experience. And whether we want all three variants to mean the same thing, only have one variant, or have each be overridable using different names i.e. start-bracket-literal() / brace / paren.

Thinking some more about the spreading syntax, we actually don't need spreading since we already have static overriding based on types so we could have vec[0, 1, [2, 3]] with the inner list automatically being spread. However, this seems strange from a user point of view.

The other issue is the order of operations. [0, 1, 2], usually would put 2 at the end of the list, but using the desugaring, 0 would be added first and end up at the end of the list. This is a major issue for ordered collections, since appending is typically much more expensive than inserting at the front (at least for lists), but not as much for sets. Maybe that could be a distinguishing factor between the {| and [| syntax, with {| being used for unordered, and desugaring in the source order, whereas the other one would desugar in reverse order.

chtenb commented 1 month ago

Is the aim to only support static (compile-time) literals, or support a more general form of collection expressions? I.e. supporting variables, so you can do val x = [a, b, ...cs]?

TimWhiting commented 1 month ago

I believe the above desugaring should support both. Though backend optimizers might only support efficiently inlining literals.

TimWhiting commented 1 month ago

The main thing that I'm concerned with this proposal is making sure that we make sure we are open to map literals, as well as anonymous struct types (canonicalized of course to prevent code blowup) which is another hope of mine.