Array type - Githubissues

byorgey commented 2 years ago

The language should also have a type for (size-indexed?) multidimensional arrays. e.g. you could imagine some kind of high-powered sensor device that would return a 2D array of the robot's surroundings, allowing it to do things like pathfinding.

There's a lot of language design to be done here; this issue is something of a placeholder. Should they be mutable? Immutable? What kinds of operations should they support? Should the type of an array reflect its dimension? Its size along each dimension? etc.?

xsebek commented 2 years ago

We already have immutable fixed-size arrays. The trick to getting an array of size n is to use an n+1 tuple:

base case:

def get1 : (a * b) -> a
  = fst
end
def set1 : x -> (a * b) -> (x * b) 
  = \x.\t. x,snd t
end

inductive case (imagine n+1 in place of 2 and missing type annotations):

def get2 : (a * b * c) -> b
 = \t. let in1 = snd t in fst in1
end
def set2 : x -> (a * b * c) -> (a * x * c) 
  = \x.\t. let in1 = snd t in fst t,set1 x in1
end

after that we can get all n by induction:

def get3 : (a * b * c * d) -> c
  = \t. let in1 = snd t in get2 in1
end
def set3 : x -> (a * b * c * d) -> (a * b * x * d) 
  = \x.\t. let in1 = snd t in fst t,set2 x in1
end

so to get an array with 1,2,3 do:

> (1,2,3,())
(1, (2, (3, ()))) : int * int * int * ()
> get3 (1,2,3,())
3 : int
> set3 42 (1,2,3,())
(1, (2, (42, ()))) : int * int * int * ()

OK, I admit this does not look great, even though we could auto-generate all the definitions with Bash. I think the way to make this more usable, would be to add type synonyms like we have def, but I can not imagine how hard or easy that would be. (EDIT: I made #153 for this related improvement) Thoughts, @byorgey?

byorgey commented 2 years ago

Hi @xsebek , yes, good observation! This is definitely a nice way to get yourself fixed-sized, immutable arrays if all you have is pairs, and if players want to do this, by all means they should!

However, this would be unsatisfactory as the "blessed" way to have arrays, for several reasons:

It is slow. I expect anything called an "array" to give me O(1) random access to any index, but this encoding with pairs takes linear time to access or update a particular index, because of the need to call snd repeatedly. (It is the same reason that (!!) takes linear time in Haskell.)
There is no way to write functions that are polymorphic in the array size. Actually your get and set functions are somewhat polymorphic, because e.g. get3 will work on any 3-tuple or higher. But using your scheme there would be no way, for example, to write a function like mapArray : (a -> b) -> array n a -> array n b that works for any size array.

After thinking about a few use cases (e.g. incrementally building a map while exploring an area), I am also convinced that we need to provide mutable arrays.

xsebek commented 2 years ago

A more determined player could write getXfromY function for each logarithmic encoding ((...*...)*(...)). :thinking: But functions like map could only be written with a fixed number of elements (so mapFirst3,...) which is indeed disappointing. :disappointed:

I am kind of scared by mutable arrays in Haskell, even using ticks to update could lead to weird things if the array was shared between robots (which one gets to update first?).

I think we could get reasonable safety with #94 - say one robot has the array and others ask him to update and what the value is somewhere. With #154 I could write data.intmap.sw, which might be a good pure alternative if mutable arrays turn out to be difficult to share between robots. But maybe I just lack imagination and mutable arrays are in fact easy :slightly_smiling_face:

byorgey commented 2 years ago

Well, I agree that using mutable arrays can be scary. But implementing them is easy. I don't think we shouldn't provide a feature just because users can shoot themselves in the foot with it.

If a mutable array is shared and multiple robots try to write to it at the same time, we will just say it is undefined which one gets to update first. That's no different than the situation with shared mutable variables in any other language, of course (Rust excepted, perhaps). Maybe down the road we could provide some kind of atomic compare-and-swap primitive or something so users can implement their own semaphores.

Do I hope players will discover some of the joys of functional programming through playing this game? Yes, of course. But at the same time I'm not trying to force them into any one paradigm.

I think we could get reasonable safety with #94 - say one robot has the array and others ask him to update and what the value is somewhere. With #154 I could write data.intmap.sw, which might be a good pure alternative if mutable arrays turn out to be difficult to share between robots.

These are all great ideas, but it seems like they relate to choices you can make as a player in terms of how to organize your code. I don't think it affects whether we should provide certain features or not.

byorgey commented 2 years ago

A few thoughts:

The type of an array should reflect whether it is mutable or immutable; the number of dimensions; and the element type.
I don't think making the actual size along each dimension part of the type is a good idea, since it would require dependent types to declare arrays with a dynamically determined size, and ensuring that array accesses are valid requires arbitrary SMT solving.
Rather than providing mutable arrays as a primitive thing, another thing to consider might be just providing mutable references as a primitive, and then one can just make an (immutable) array full of mutable references. This might have some implications for efficiency if it's done naively (extra boxing, etc.) but that could be addressed via optimizations.

byorgey commented 2 years ago

I think we should implement immutable arrays as a start. Once we get some experience with those we can think about how to add mutable arrays. As a concrete proposal:

array a is the type of one-dimensional arrays with elements of type a.
- n-dimensional arrays can be created by nesting this type.
For constructing arrays, a primitive built-in function generate : int -> (int -> a) -> array a.
For accessing arrays, a primitive built-in operator ! which takes an array a and an int and returns an a (and can throw an out-of-bounds exception).
A built-in function size : array a -> int.
There are lots of other functions one might want, e.g. map, but those can be implemented in terms of the above. For example,
```
mapA : (a -> b) -> array a -> array b
mapA f arr = generate (size arr) (\i. f (arr ! i))
```

We can certainly think about adding other construction methods. For example, using generate is unsatisfactory for converting a list to an array:

arrayFromList : list a -> array a
arrayFromList as = generate (length as) (\i -> as !! i)

This is unsatisfactory because it takes time quadratic in the length of the list. So we might contemplate providing a primitive function to build an array from a list, with a type like

arrayFromList : (rec l. Unit + a * l) -> Array a

which could probably reuse some of the underlying CESK machine infrastructure for collecting up array values and then creating an array.

byorgey commented 1 year ago

Implementing something like generate turns out to be kind of tricky. The issue is that when we get around to executing generate n f we have to then arrange for the CESK machine to evaluate the application of f to 0 through n-1 and collect the results into an array value. I figured out one way to do it when implementing mktext as part of #861; since we ultimately decided not to include mktext I wanted to record the technique here for whoever ends up implementing arrays. The basic idea would be to add a custom continuation frame that records the function being applied, where we are in the loop from 0 to n-1, and also collects intermediate array values:

  | -- | Accumulate partial results from 'generate'.  Stores the index function,
    --   the current index, and the accumulated values.
    FGenerate Value Integer [Value]

First, in the execConst function we would have something like this:

  Generate -> case vs of
      [VInt n, f]
        | n < 0 -> raise Generate ["Length must be nonnegative:", prettyValue (VInt n)]
        | n == 0 -> return $ Out (VArray []) s k
        | otherwise -> return $ Out (VInt (n - 1)) s (FApp f : FGenerate f (n - 1) [] : k)
      _ -> badConst

That is, in any case where n > 0, we return a value of n-1, set up the stack to apply f as the next action, and after that put a FGenerate frame which records that we are applying f, we are on index n-1, and we have no collected array values yet. The reason to loop from n-1 down to 0 is so that we can cons each new value onto the front of the accumulating list of array values in constant time (alternatively, we could also loop from 0 to n-1, cons each value, and then do a final reverse at the end when creating the array value).

Finally, in the stepCESK function we add a few cases for dealing with FGenerate frames:

  Out v s (FGenerate _ 0 vs : k) ->
    return $ Out (mkArray (v:vs)) s k
  Out v s (FGenerate f i vs : k) ->
    return $ Out (VInt (i - 1)) s (FApp f : FGenerate f (i - 1) (v : vs) : k)

The base case is when we have just processed index 0; we cons on the last (index 0) value and call some appropriate mkArray function to create an array value (whatever those end up looking like). In the general index i case, we cons the returned value onto the accumulating list, and output the previous index with stack frames to apply f and then do another FGenerate loop.

swarm-game / swarm

Array type #98