microsoft / qsharp-language

Official repository for design of the quantum programming language Q# and its core libraries
MIT License
235 stars 56 forks source link

Multidimensional arrays #39

Closed cgranade closed 3 years ago

cgranade commented 3 years ago

Suggestion

Currently, for any Q# type 'T, the array type 'T[] represents an immutable collection of values of type 'T indexed by a single integer. It would be really helpful to add a new collection type that is indexed by tuples of integers instead, so as to allow for a more natural representation of concepts like matrices and tensors.

Considerations

While nested arrays can be used to do so (e.g.: Complex[][]), this can presents a number of disadvantages compared to a multidimensional array:

Building on the utility of 1-D array notation, this suggestion proposes modifying Q# to include new multidimensional array types 'T[,], 'T[,,], and so forth. Like values of type 'T[], these new multidimensional would also be immutable, and could be manipulated by using the subscript ([]) and copy-and-update (w/) operators.

As a possible alternative, we could instead consider providing syntactic sugar for manipulating nested (ragged) arrays; for example, copy-and-update expressions could be extended to include tuple indices:

mutable matrix = [[1, 0, 0], [0, 1, 0], [0, 0, 1]];
set matrix w/ 2 <- (matrix[2] w/ 1 <- 20);
// syntactic sugar for the above:
set matrix w/ (2, 1) <- 20;

While this would address the ergonomics of using values of type 'T[][], 'T[][][], and so forth, this syntactic sugar does not represent the compile-time guarantee that nested arrays are rectangular, necessitating potentially expensive shape checks (e.g. Microsoft.Quantum.Arrays.RectangularArrayFact). Similarly, providing syntactic sugar for nested arrays would not fundamentally change performance concerns with using nested arrays to represent matrices and tensors. For instance, by modifying the formula used to map multidimensional indices to linear indices, matrix and tensor transposition can be implemented in time that scales with the number of dimensions rather than the number of elements.

Context

This suggestion would require adding new functions and operations to the Microsoft.Quantum.Arrays namespace in the Q# standard library to support the feature; as Q# does not currently allow for a function or operation to act on multiple input types except as unbounded type parameters (see https://github.com/microsoft/qsharp-language/issues/149), these functions or operations would require distinct names from their 1D counterparts (e.g.: ConstantArray2, ConstantArray3, and so forth). Adding these functions and operations may require breaking changes to Microsoft.Quantum.Arrays in order to accommodate a naming scheme that includes suffixes for dimensionality.

Currently, array types in Q# can be indexed by values of type Int or Range, but current array indexing could be extended to allow "fancy" slicing by values of type Int[] (similar to the functionality offered by Microsoft.Quantum.Arrays.Subarray). In the same way, this suggestion could be extended to allow indexing 2D arrays by values of type (Int[], Int[]) as well as (Int, Range), (Range, Int), (Int, Int), and so forth. For example:

let data = [
    (0, 0), (0, 1), (0, 2);
    (1, 0), (1, 1), (1, 2);
    (2, 0), (2, 1), (2, 2)
];
let corners = data[[0, 2], [0, 2]];     // โ† [(0, 0), (0, 2); (2, 0), (2, 2)]
let rightHandCorners = data[[0, 2], 2]; // โ† [(2, 0), (2, 2)]
let swapMatrix = (IdentityMatrix(4))[[0, 2, 1, 3], [0, 2, 1, 3]];

Examples

Example 1: Declaring a literal array representing a matrix over ๐นโ‚‚.

// Bool[,] literals differentiated from Bool[][] by having only one level of [ ], corresponding
// to only one pair of [ ] used to subscript and in the type signature.
let bellTableau = [
    true, true, false, false; // โ† "rows" are separated by ; rather than comma.
    false, false, true, true
];

Example 2: Declaring and updating a 3ร—3 array of Double values.

namespace Microsoft.Quantum.Arrays {
    function IdentityMatrix(nDimensions : Int) : Double[,] { ... }
}

mutable matrix = IdentityMatrix(3);
set matrix w/ (2, 1) <- 20;
// matrix is now [1, 0, 0; 0, 1, 0; 0, 20, 1]

Example 3: Indexing into a stabilizer tableau using ranges and integers.

let perfect = [
    PauliI, PauliX, PauliZ, PauliZ, PauliX;
    PauliX, PauliZ, PauliZ, PauliX, PauliI;
    PauliZ, PauliZ, PauliX, PauliI, PauliX;
    PauliZ, PauliX, PauliI, PauliX, PauliZ
];
Message($"{perfect[0, 2]}"); // โ† Prints PauliZ.
let firstQubit = perfect[(..., 0)]; // โ† [PauliI, PauliX, PauliZ, PauliZ]
let firstRow = perfect[(0, ...)]; // โ† [PauliI, PauliX, PauliZ, PauliZ, PauliX]
let subblock = perfect[(1...2, 0...3)]; //  โ† [PauliX, PauliZ, PauliZ; PauliZ, PauliZ, PauliX]

Example 4: Compatibility with nested array types.

let identityMatrices = MappedOverRange(IdentityMatrix, 1..5); // type: Double[,][] (array of 2D arrays)
let contrived = [
    [0], [0, 1];
    [0, 1, 2], [0, 1, 2, 3]
]; // type: Int[][,] (2D array of arrays)

Affidavit (please fill out)

Please add ticks by placing a cross in the box:

Please tick all that apply:

bettinaheim commented 3 years ago

@cgranade Thank you for making the suggestion! Looks promising! Some initial comments:

cgranade commented 3 years ago

@cgranade Thank you for making the suggestion! Looks promising!

๐Ÿ’•

Some initial comments:

  • I believe this would not be a breaking change for the Q# language, so I believe this box can be checked as well. It could make sense to make some breaking changes in the libraries, but also that can be avoided.

Sounds good, I wasn't quite sure of the scope for that checkbox. Will edit the suggestion accordingly.

  • As related concepts, we should consider how array slicing looks like and works for multi-dimensional arrays, and how things would look like if we were to support slicing/the construction of new arrays by giving an array of indices in the future.

Sounds good, I'll edit to add a section considering extensions to slicing as a related concept.

  • For a potential proposal, briefly mentioning under alternatives how far we would get with pure syntax sugar for manipulation of (jagged) nested arrays would be good.

Makes sense; I'll add that as well, then.

Thanks for the feedback, let me go add that to the suggestion then!

bettinaheim commented 3 years ago

A couple of notes on things to cover in the proposal that came up in conversations:

Can multidimensional arrays be used instead of jagged array (i.e. variance/subtyping)? How do item access and update look like for various "mixtures" of jagged and multidimensional arrays? What are runtime failures and what are compile time failures? And related: which item/parts of multidimensional array literals can consist of expressions?

Two syntax suggestions for array literals came up; Options 1:
1D: [1,2] 2D: [|1,2|, |3, 4|] 3D: [||1,2|, |3, 4||, ||5,6|,|7,8||] Option 2:
1D: [1,2] 2D: #[[1,2], [3, 4]] 3D: ##[[[1,2], [3, 4]], [[5,6],[7,8]]]

It would be good to explicitly mention in the proposal that the type reflects how many dimensions the array has (2D, 3D, etc.), but not its length in a given dimension.

bettinaheim commented 3 years ago

@cgranade Capturing a couple of points that came up in our conversation here (disregard if you have already captured them in the proposal):

And something to consider regarding the array literal syntax: If we ever have another collection data type similar to an array, do we have a good option for defining a literal syntax in that case as well?

cgranade commented 3 years ago

@cgranade Capturing a couple of points that came up in our conversation here (disregard if you have already captured them in the proposal):

Thanks for capturing these points, @bettinaheim, I'll work to get those addressed in microsoft/qsharp-compiler#49 shortly. In the meantime, some rough thoughts:

  • We should think about what operators we would potentially want to introduce for multidimensional arrays. Tensor product and matrix multiplication come to mind.

Good point, I'll flesh that out. My initial thought is that pointwise versions of scalar operators that "broadcast" over multidimensional arrays would be good (e.g.: .+ means + acting pointwise on elements, and not on the arrays themselves), as well as the two that you mention. Since each of these new operators could possibly fail due to shape errors (as per @samarsha's point), setting them apart visually also makes it easier to identify potential sources of unsafety.

  • That naturally also bring us to matrix addition (itemwise addition); do we want to support that? What does + mean for multidimensional arrays, given that it is used for concatenation in the 1D case?
  • Should we allow concatenation of multidimensional arrays? If so, what does that mean?

I'd personally suggest that, since + cannot fail at runtime in the 1D case, but can fail at runtime in the ๐‘›-d case, we support + only for 1D arrays and rely on Concatenated2, Concatenated3 and so forth functions for higher dimensions. That also has the advantage that we can then take inputs representing which dimension to concatenate along, which is difficult to do with the infix operator form alone.

  • Can we treat the 1D case merely as a special case for n-D arrays?

I think that makes sense, yes. I'd suggest that all new operators considered here work for ๐‘›-d arrays as well as 1D; from that perspective, + is reserved for the special case of 1D, but everything else should hold more generally.

And something to consider regarding the array literal syntax: If we ever have another collection data type similar to an array, do we have a good option for defining a literal syntax in that case as well?

In general, we're in pretty short supply for delimiters, since many are already used up:

That leaves either reusing an existing delimiter, adopting delimiters made from multiple characters (e.g.: [| ... |], #[ ... ]), or adopting more exotic characters as delimiters. Especially given that shortage, I agree with making sure that we have a clear view of what else we may want to use delimiter pairs for in 1.0 so as to avoid any unneeded collisions.

For example, if we want to have an associative type, how should that be denoted? Looking at other languages doesn't provide immediate insight here, I don't think... In Python, both set and dict types are denoted using {}, leading to confusion when declaring empty sets and dictionaries (set() and {}, respectively). C# at least partially gets around this by not having literals of type Dictionary or Set, but rather having the more general concept of collection initializers that build up collections by calling Add and AddRange methods, but that's not great here for other reasons. Similarly, F# uses functions like set and Map.ofList to construct sets and maps from 'T list and ('Key * 'Value) list respectively. By contrast, Mathematica uses compound delimiters (in particular, <| ... |>) to denote literals of associative types.

(Odd side thought, but what is an associative type but a special case of 'Key -> Maybe<'Value> constructed from explicit values, and that you can efficiently find the domain of?)

From that perspective, I'll admit a slight preference towards alternatives for literal syntax that avoid further exhausting basic pair delimiters (i.e.: #[ ... ] or [| ... |])...