haasn / -g-pl

/g/ programming language
13 stars 2 forks source link

Array structure #19

Open graydude opened 13 years ago

graydude commented 13 years ago

It just occurred to me that we have no array, list or vector structure in the draft, and that it is currently impossible to create such a structure by combining elements of the language. Does anyone have a suggestion?

haasn commented 13 years ago

intrinsic, heterogenous list type?

[exp1, exp2, exp2] to create, and exp[num] to access?

graydude commented 13 years ago

That creation and access look good. I do like dynamic arrays though, and adding with exp[]

Sent via Hubroid

haasn commented 13 years ago

Yeah, I was thinking they'd be dynamic. Or rather, they (like all values) would be constant, but functions (eg. append) could return an array which is the same + 1 element.

Thoughts on implementation? Linked list?

graydude commented 13 years ago

That would lead to a nice consistency between the rest of the variable types and arrays.

I imagine:

>implying list isn't [1, 2, 3, 4]
>mfw list
-> [1, 2, 3, 4]
>append list 2;
-> [1, 2, 3, 4, 2]

Any thoughts on multidimensional lists?

>append list [1, 2, 3]
>mfw list
-> [1, 2, 3, 4, 2, [1, 2, 3]]

Linked list would make it harder to use multidimensional lists. How about a doubly linked list, where the prev pointer of each element points to the first element of the previous element? If that's unclear, in:

[a, b, [c, d, e], f, g]

prev[f] points to c, but next[c] points to d. next[e] points to f. That way, if next[i] != prev[i+1], you're at the end of a sub-list. Just a thought, maintaining that start of a list might be a bit harder.

haasn commented 13 years ago

That's not how doubly linked lists work. In your example, prev[f] would point to the entire [c, d, e] list and prev[[c, d, e]] would point to b.

I'm still unsure / thinking about how to implement lists because currently, there's no such thing as data mutations. If we want to include an append function, we'd either need to introduce data mutations (not exactly sure what issues this would have but I'm sure it would mess with closures), or we need to copy the entire list when appending (eg. data-driven model).

One would be potentially unsafe, but the other would be far slower. Which is the greater evil, in your opinion?

graydude commented 13 years ago

You misunderstood me, I was talking about using 'flat' lists to represent multidimensional lists. But I guess it would be better to just allow different types to be an element in a list.

I think copying would be the best solution, as it clearer what are the downsides to that.

Sent via Hubroid

haasn commented 13 years ago

Yes, that's one possible way to represent multi-dimensional lists, in fact, it's called a “jagged array” (jagged because the length of the sublists need not be constant). The way most languages do it is by simply using a long one-dimensional list and indexing as y * stride + x, where stride is usually equal or close to the width of the supposed two-dimensional array.

Come to think of it, copying the entire list will probably not be necessary if using a complex linked tree structure. Of course, this would make indexing slow (faster than a flat linked list), but copying would not be necessary as you would simply reference to the existing copy of parts that haven't changed, and create copies of only the bits you want to modify.

Still, it's an implementation issue and unrelated to the specification.

graydude commented 13 years ago

True, not important for the specification. I remember quite a lot from my data structures class, though, that could come in handy later.

haasn commented 13 years ago

One idea I had was to use a large binary tree to hold a linked list, eg [a, b, c, d, e] could be represented as:

     ___/\__
   _/\      \
  /   \     /
 /\   /\   /
a  b c  d e

Changing a list would return a new list with everything but the changed path simply being references to the previous node, so for example if I replaced c by x, only the following tree would get reallocated:

     ___/
     \
      \
      /
     x

And the rest, not shown here, would be a reference to the original copy. To speed up indexing, each node could hold its total length (that is, the length of both child nodes combined, with a leaf having length 1). So the lengths here would be:

     ___/5\_____
   _/4\_       1\
  /     \       /
 /2\    /2\    /1
1   1  1  1   1

To look up an element at index n, one would progress down the left sub-tree if the next power of two greater than the tree length divided by two is greater than the index position.

Example with index 2:

Next power of two after 5 is 8 (written better as 2^ceil(log(5)/log(2)) = 2³ = 8); 8 / 2 = 4; 4 > 2, therefore we progress down the left subtree.

For the second sub-tree, 2^ceil(log(4)/log(2)) = 2² = 4; 4/2 = 2; 2 = 2, therefore we progress down the right subtree. When progressing down the right subtree, the index gets decreased by the length of the current (divided by two).

In effect, this means that we search for the element 0 in the right subtree of that, which has length 2: 2^ceil(log(2)/log(2)) = 2¹ = 2; 2/2 = 1; 1 > 0 therefore left subtree.

The left subtree of that tree is the element “c”, thus giving us the element we wanted (at index 2).

This means that accessing a specific node should be O(log n) time in the worst case, as opposed to O(n) for linked lists and O(1) for flat arrays.