Proposal for removing default-initialized new array expressions

microsoft / qsharp-language

Official repository for design of the quantum programming language Q# and its core libraries

MIT License

235 stars 56 forks source link

Proposal for removing default-initialized new array expressions #48

Closed bamarsha closed 3 years ago

bamarsha commented 3 years ago

This is the proposal for issue microsoft/qsharp-compiler#46.

guenp commented 3 years ago

I agree with @anpaz's suggestion

cgranade commented 3 years ago

My preference would be:

Remove new

Provide the methods proposed by Sarah:

EmptyArray<T>()

ConstantArray(n, value)

Currently, both of these functions already exist in Microsoft.Quantum.Arrays:

Provide sugar for:

T[] ==> EmptyArray<T>()

I think following the above suggestion from @samarsha, if we want to annotate types of expressions, we should likely do so as a general feature and not as a one-off for empty arrays. Since types in Q# always follow values in declarations (e.g.: target : Qubit), it feels like annotating empty arrays as [] : 'T would be more consistent with that convention.

bamarsha commented 3 years ago

T[n] ==> ConstantArray(n, Default); and have the compiler fail for any T that has no default (like Qubit and String).

This means that users will need to memorize not only what the default values are for each type, but which types have default values. It saves a few characters when writing, but IMO it's harder to read than explicitly writing out the value you want to use. I think we should optimize for readability and consistency here.

By contrast, if you require array initializers to include the value explicitly, users only need to memorize the initialization syntax. Always including the value also more intuitively generalizes to initializing arrays of non-default values: instead of needing to switch from Int[n] for an array of 0s, to ConstantArray(n, 1) for an array of 1s, they would both be ConstantArray(n, 0) and ConstantArray(n, 1) (or whatever syntax we end up with).

guenp commented 3 years ago

@cgranade @samarsha I agree that let myArray = [] : Int would work great.

Regarding using Int[4] instead of ConstantArray(4, Default); for this example specifically, I can see that let myArray = Int[4] might create ambiguity with let myArray = [4].

Ideally I would suggest let myArray = [0] * 4 # same as [0, 0, 0, 0], but @samarsha explained above why that won't work for us:

We would need to choose a different operator name than . Otherwise, it's not possible to reconcile the type of used for arrays with the type of * needed to multiply two numbers together. This would make it very difficult to create a type class to represent numeric types. See microsoft/qsharp-language#149.

I would be fine coming up with a new operator for this; would something like this work for example?

let myArray = [0] : len(4) # same as [0, 0, 0, 0]

alan-geller commented 3 years ago

@cgranade @samarsha I agree that let myArray = [] : Int would work great.

Regarding using Int[4] instead of ConstantArray(4, Default); for this example specifically, I can see that let myArray = Int[4] might create ambiguity with let myArray = [4].

Ideally I would suggest let myArray = [0] * 4 # same as [0, 0, 0, 0], but @samarsha explained above why that won't work for us:

We would need to choose a different operator name than . Otherwise, it's not possible to reconcile the type of used for arrays with the type of * needed to multiply two numbers together. This would make it very difficult to create a type class to represent numeric types. See microsoft/qsharp-language#149.

I would be fine coming up with a new operator for this; would something like this work for example?
let myArray = [0] : len(4) # same as [0, 0, 0, 0]

I don't think this works either. In Q#, it's only type names that come after a :, which would imply that len(4) is a type name. Now we've got dependent types creeping in, with an implication that len(4) and len(2) are different types.

I'm not sure why this has problems with type classes. Why can't there be a type class "Multipliable" that contains arrays and numbers, and * supports multiplying Multipliables by numbers? + will have the same need in order to support array and string concatenation.

guenp commented 3 years ago

Thanks, @alan-geller , that makes sense. For completeness, @samarsha also mentioned the issue of overloading with let myArray = [0] * 4;

The type of is overloaded to both ('a[], Int) -> 'a[] and (Int, 'a[]) -> 'a[], since Python also supports 10 [1]. Q# does not support overloading. I would argue that there are good reasons for Q# to never support overloading, but there is a feature request for this: microsoft/qsharp-language#145

Aside from whether this would lead to a huge amount of work or not, I'm curious if this would make sense:

let myArray = [0 for i in 0..3] # same as [0, 0, 0, 0]

alan-geller commented 3 years ago

Thanks, @alan-geller , that makes sense. For completeness, @samarsha also mentioned the issue of overloading with let myArray = [0] * 4;

The type of is overloaded to both ('a[], Int) -> 'a[] and (Int, 'a[]) -> 'a[], since Python also supports 10 [1]. Q# does not support overloading. I would argue that there are good reasons for Q# to never support overloading, but there is a feature request for this: microsoft/qsharp-language#145

Aside from whether this would lead to a huge amount of work or not, I'm curious if this would make sense:
let myArray = [0 for i in 0..3] # same as [0, 0, 0, 0]

To me, that makes perfect sense. It's just like [i*i for i in 0..3] to get [0,1,4,9}.

cgranade commented 3 years ago

I'm not sure why this has problems with type classes. Why can't there be a type class "Multipliable" that contains arrays and numbers, and * supports multiplying Multipliables by numbers? + will have the same need in order to support array and string concatenation.

If I understood right, it was that @samarsha's proposal for a typeclass / concept representing types supporting * captured a signature of the form (*) : ('T, 'T) -> 'T (that is, a magma), but 3 * ["foo"] yielding ["foo", "foo", "foo"] looks more like thinking of * as the group action of Int on String[] (in particular, (n + m) * data is the same as n * data + m * data, so group action axioms are preserved). Using the same operator for the magma and group action typeclasses may be confusing in some contexts, perhaps?

(On a side note, I say magma and not semigroup since there's no way at the type system level to enforce associativity; strictly speaking, 3 * ["foo"] may fail to be a group action for the same reason, but I think that points to the need for type classes to come with their own unit tests that enforce invariants of those type classes.)

bamarsha commented 3 years ago

Why can't there be a type class "Multipliable" that contains arrays and numbers, and * supports multiplying Multipliables by numbers?

Sorry for the Haskell, but is this what you mean?

class Multipliable a where
    (*) :: Num b => a -> b -> a
    -- or
    (*) :: Num b => b -> a -> a

This type class seems to have unfortunate implications. For example, if a ~ Int and b ~ Double, then an Int times a Double must yield an Int. Also, it would allow arrays to be multiplied by Double, unless we restrict Num b to Int, but then how do we define multiplication between two Doubles?

+ will have the same need in order to support array and string concatenation.

I don't think + necessarily has the same problem for concatenation if we use a monoid. Both numeric + and concatenation + would have the same type, Monoid a => a -> a -> a.

bettinaheim commented 3 years ago

@guenp @alan-geller Since the resolution to the syntax question is a bit buried in the rather long thread above, let me recap here:

We will add support for empty array literals ([]), no type annotation needed (or supported at this time).
We will add the syntax ["", size=5] for creating arrays with all items instantiated to a given value ("" in this case).

The reason for the choice of syntax is the following:

The syntax restricted to within rectangular brackets to not limit or negatively impact future capabilities.
The construct resembles and should be treated as a constructor call, and the two items within rectangular brackets as the corresponding argument items (while passing argument items by name is not yet supported, it is something we want to add in the future). The distinction that it is effectively a constructor call and dissimilar to the array comprehension syntax is important since M(q) is evaluated exactly once (also when the size is 0). This is in contrast to [M(q) for _ in 1..10] where M(q) would indeed be evaluated 10 times.
size is a full, unique, and short word that can be searched for in our docs.

With this interpretation, we can build on it in the future and potentially fully support partial application also for such built-in syntax constructs (to be worked out in a separate proposal), meaning things like these:

// In all cases, the exact type of the callable will be inferred based on the first usage. 
let buildTuple = (1,2,_); // callable to construct a tuple
let populateArray = [_, size=10]; // callable to construct an array
let createArray = [Zero, size=_]; // callable to construct an array
let multiply = _ * _;

In combination with array comprehension (to be worked out and confirmed in a separate proposal) and multidimensional arrays (a proposal is in progress), we get the following:

// all of these create an array of arrays of length 3
[0, size=(3,2)] 
[[x*x for x in 1..2], size=3] 
[[0 size=2], [0 size=3], [0 size=4]]
[[1,2], [3,4,5], [6,7,8,9]]

// all of these create a 2D array of shape (3,2)
#[0, size=(3,2)] 
#[[x*x for x in 1..2], size=3] 
#[[0 size=2], [1 size=2], [2 size=2]]
#[[1,2], [3,4], [5,6]]

// all of these create a 3D array of shape (4,3,2)
#[0, size=(4,3,2)] 
#[#[[x*y for x in 1..2] for y in 1..3], size=4] 
#[#[[1,2], [3,4], [5,6]], #[[1,2], [3,4], [5,6]], #[[1,2], [3,4], [5,6]], #[[1,2], [3,4], [5,6]]]

cgranade commented 3 years ago

Why can't there be a type class "Multipliable" that contains arrays and numbers, and * supports multiplying Multipliables by numbers?

Sorry for the Haskell, but is this what you mean?
class Multipliable a where
    (*) :: Num b => a -> b -> a
    -- or
    (*) :: Num b => b -> a -> a
This type class seems to have unfortunate implications. For example, if a ~ Int and b ~ Double, then an Int times a Double must yield an Int. Also, it would allow arrays to be multiplied by Double, unless we restrict Num b to Int, but then how do we define multiplication between two Doubles?

+ will have the same need in order to support array and string concatenation.

I don't think + necessarily has the same problem for concatenation if we use a monoid. Both numeric + and concatenation + would have the same type, Monoid a => a -> a -> a.

I think we can have both behaviors "just work" if we separate the * for multiplicative magmas from the * for the left group action of integers on 'T[]. Concretely, consider the following sequence of type classes and unit test templates:

concept 'T is MultiplicativeMagma when {
    function (*)(left : 'T, right : 'T) : 'T;
}

concept 'T is MultiplicativeSemigroup where 'T is MultiplicativeMagma, EquatableTo<'T> when {
    invariant Associative(left : 'T, middle : 'T, right : 'T) : Unit {
        if ((left * middle) * right != left * (middle * right)) { fail ""; }
    }
}

concept 'T is MultiplicativeMonoid where 'T is MultiplicativeSemigroup when {
    function MOne() : 'T;

    invariant LeftIdentity(right : 'T) : Unit {
        if (MOne() * right != right) { fail ""; }
    }

    invariant RightIdentity(left : 'T) : Unit {
         if (left * MOne() != left) { fail ""; }
    }
}

concept 'T is MultiplicativeGroup where 'T is MultiplicativeMonoid when {
    function Reciprocal(self : 'T) : 'T;

    // TODO: invariants for x * (1/x) = 1 and (1/x) * x = 1.
}

concept 'T is AdditiveMagma when {
    function (+)(left : 'T, right : 'T) : 'T;
}

// TODO: other concepts for additive versions of semigroup, monoid, and group.

concept 'T is LeftActsOn<'U> where 'T is MultiplicativeGroup, 'U is EquatableTo<'U> {
    function (**)(left : 'T, right : 'U) : 'U;

     invariant Associative(g : 'T, h : 'T, x : 'U) : Unit {
         if ((g * h) ** x != g ** (h ** x)) { fail ""; }
     }

     invariant LeftIdentity(x : 'U) : Unit {
         if (MOne() ** x != x) { fail ""; }
     }
}

bamarsha commented 3 years ago

Yes, if the operator names are different, then I don't think there's any issue. My concern was only with re-using * for both cases.