microsoft / qsharp-compiler

Q# compiler, command line tool, and Q# language server
https://docs.microsoft.com/quantum
MIT License
684 stars 170 forks source link

Discriminated unions, Maybe? #406

Closed crazy4pi314 closed 3 years ago

crazy4pi314 commented 4 years ago

Is your feature request related to a problem? Please describe. It would be really nice to have something like a maybe discriminated union. Say I have a function that I want to just return false to represent it failing, and if it succeeds then to return a different type (say a tuple).

Describe the solution you'd like It would be nice to have something like Maybe<'T> where that's understood to be the type 'T or some other default type like bool to indicate something failed.

Describe alternatives you've considered Not sure.

Additional context I was working on refactoring some code in a Shor's algorithm sample and there are many steps there that can fail. I have split many of them out into their own functions and operations but that means I have to do something like this:

function MaybeFactorsFromPeriod(generator : Int, period : Int, number : Int) : (Bool, (Int, Int))
    {
        if (period % 2 == 0) {
            let halfPower = ExpModI(generator, period / 2, number);
            if (halfPower != number - 1) {
                let factor = MaxI(GreatestCommonDivisorI(halfPower - 1, number), GreatestCommonDivisorI(halfPower + 1, number));
                return (true, (factor, number / factor));
            } else {
                return (false, (1,1));
            }
        } else {
            return (false, (1,1));
        }
    }

Here I just have it return a tuple of a bool indicating if it worked, and a tuple of integers that are the derived factors if we got lucky on the period estimation part. I then have to unpack this in the main algorithm code, and I don't want to return (1,1) if it failed.

bettinaheim commented 4 years ago

@crazy4pi314 Thanks for the feature request! DUs can be super useful, and for me personally are one of features I am often quite fond of in a language. This needs some design work first, and then would require also quite a bit of dev work to support. Let me know if you would be interested in getting involved with either one. :)

crazy4pi314 commented 4 years ago

I'd be happy to help, I am gonna take a look at some other language feature proposals and try to get something together that can be discussed!

crazy4pi314 commented 4 years ago

Suggestion

This suggestion outlines how to add discriminated unions to Q#. The overall philosophy/objective with this design is to use it to extend/generalize the current design of user defined types (UDTs). It may not break current UDT syntax, and current UDTs should be thought of as single-case DUs.

Considerations

Firstly, discriminated unions (DU), or tagged unions, are a fairly common language feature for function languages and will allow users of Q# to leverage coding patterns they are already familiar with. Scala calls DUs case classes, Rust has DUs called enums, Haskell has sum types, and F# calls them DUs. This gives us a lot of examples to draw from and evaluate to create the right feature for Q#.

Secondly, DUs can make code more robust as they allow you to use the type system to reduce bugs or unhandled edge cases. In Examples there are some samples of how DUs can improve common Q# patterns and usage. The main patterns I have run into where they might be useful fall into the following categories:

If this is implemented in a way that allows all current UDT syntax to be valid, then this shouldn't be a breaking change and should just be strictly a feature add.

It is not clear to me there is another way to achieve this functionality other than creating a UDT where each field is a valid case/tag and you would use if statements to filter on the populated field manually. This is similar to how I have had to implement what here would be a Maybe DU where I used a tuple (Bool, (Int, Int)) as a return type signature (see this issue). There are certainly other designs for how to add DUs to Q#, but this proposal focuses on making it a non-breaking change by extending current UDTs.

Context

The most important feature of this design approach is that it feels natural with the current implementation for UDTs, and allows for all current UDT syntax to be valid.

DUs have the following components:

  1. The name of the DU type itself/the declaration syntax.
  2. Cases (sometime implemented with tags) that identify which part of the union the value is in.
  3. Constructor for values of each case.
  4. Destructor syntax that allows you to match on which case a value is in.

The following section goes into much more detail on what each of these parts could look like for Q#. Much of the suggested syntax here comes from the examples that I looked in this reference on F#. As noted above though, there are many languages that have similar features, but the F# syntax seemed to match most closely with established patterns in Q#.

Examples

The three main syntactic things that need prototyped to extend UDTs to DUs:

  1. how to declare a new DU type,
  2. what case constructors look like,
  3. and how to destruct the different types in the union.

To that end, the newtype keyword would be used to declare new DUs as well. The following sections give example suggested designs for the three syntaxes listed above.

DU declarations

Since the goal is to make this feel natural with the current implementation for UDTs, here is a quick review of that syntax.

newtype Complex = (Re : Double, Im : Double);

You can see it uses the statement newtype to declare a new UDT, and there are named items in the defininition itself. For more details and examples see the docs.

The following examples would all be valid syntax for this design.

Example 1:
Standard UDT syntax where the constructor function implicitly will have the same name as the type.

newtype Foo = (Bar : Baz);

Example 2:
Standard UDT syntax where the constructor function is explicit.

newtype Foo = Foo(Bar : Baz);

Example 3:
A two case DU where there is no leading bar.

newtype MaybeBaz =
    | Some(Bar : Baz)
    | None();

Example 4:
A two case DU where a complex number can have one of two different coordinate systems.

newtype Complex = 
    Cartesian(Re : Double, Im : Double)
    | Polar(Double, Double)

Example 5:
Standard UDT syntax where the constructor function is explicit, and has a leading bar.

Note: probably should be valid for consistency, but not approved by the style guide

// newtype Foo = | Foo(Bar : Baz);

Example 6:
Standard UDT syntax where the constructor function is implicit, and has a leading bar.

Note: probably should be valid for consistency, but not approved by the style guide

newtype Foo = | (Bar : Baz);

Example 7:
A two case DU where there is no leading bar.

newtype QInt =
    LittleEndian(Qubit[]) |
    BigEndian(Qubit[]);

Example 8:
A multi-case DU where all the case types are Unit, a.k.a. an enum. Could also potentially omit parens.

newtype Weekday =
    | Monday()
    | Tuesday()
    | Wednesday()
    | Thursday()
    | Friday();

Example 9:
A multi case DU that generalizes implementation options for an adder.

newtype AdderMethod = 
    | CDKM()
    | Draper(LowDepth : Bool)
    | TTK();

The design here also allows for type parameterization. the type parameter 'T can be any other valid type in Q#, see the docs for more info.

Example 10:
A parameterized two case DU where there is a leading bar.

newtype Maybe<'T> =
    | Some('T)
    | None();

Case constructors

You have seen some of the syntax for the case constructors already, they are the bits similar to function declaration syntax (i.e. Some(Int)). One way to think about the deign here is that you are defining a collection of UDT constructors, one for each case. You are declaring cases by declaring their constructors. That means that the case constructor functions will be declared as globals in the same namespace that the DU is declared in. As a consequence of that, then you will not be able to have DUs with case constructors that have the same fully qualified name. I don't think at the moment that is a problem as it follows kind of the same pattern as you cannot have a UDT and a function share the same name in a namespace.

Examples of valid case constructor syntax:

Example 1:
Given the Maybe<'T> declaration, declaring a case that returns a type of Maybe<Int>.

newtype Maybe<'T> =
    | Some('T)
    | None();

// Has type Maybe<Int>
let x = Some(42);

// Has type String -> Maybe<String>
let maybeString = Some<String>;

Example 2:
Constructs a register that represents an integer, that can be interpreted as either little or big endian.

newtype QInt =
    | LittleEndian (Register : Qubit[])
    | BigEndian (Register : Qubit[]);

using (qubits = Qubit[3]) {
   let target = LittleEndian(qubits);
   // ...
}

Example 3:
Case constructor that has type Unit.

newtype Maybe<'T> =
    | Some('T)
    | None();

// Could be used to indicate that some calculation did not succeed. 
let return = None<String>();

// For constructors isomorphic to `Unit`, parens could be omitted.
let return = None<String>;

Case deconstructors

For the deconstructor here, an expression design could make a lot of sense. That allows you to set a variable equal to the result of a deconstruction statement. See the following examples of how this could work.

Example 1:
Example 3 from the previous section, and we use match to deconstruct the cases possible for Maybe<'T>.

newtype Maybe<'T> =
    | Some('T)
    | None();

function Foo(bar : Int) : Maybe<Int> {
    if (Int % 2 == 0) {
        return Some(bar/2);
    } else {
        // Would be nice to not have to specify <Int> here but that's not necessary for the DU design.
        return None<Int>;
    }
}

// Could be used to indicate that some calculation did not succeed.
let x = match Foo(8) {
    // It's OK to omit <Int> here since that's
    // known exactly from the type of Foo(8).
    Some(_) -> "Even",
    None -> "Odd"
}

Affidavit (please fill out)

Please tick this by placing a cross in the box:

Please tick all that apply:

bamarsha commented 4 years ago

Thanks for writing up this proposal! I'd love to see Q# get sum types.

To that end, the newtype keyword would be used to declare new DUs as well.

In Haskell, newtype is used to declare strongly-typed type aliases that are limited to a single constructor. As a result, they can be erased at runtime. I am not sure if that was also part of the design of newtype in Q#. If Q# is making a distinction between types that can be erased at runtime and types that can't, then we can't use newtype for multi-constructor types. If newtype is meant to be a general data type, then I would still suggest using a different keyword like type for simplicity and to avoid confusing Haskell people. :)

newtype Complex = 
    Cartesian(Re : Double, Im : Double)
    | Polar(Double, Double)

newtype QInt =
    LittleEndian(Qubit[]) |
    BigEndian(Qubit[]);

I'm not sure if DUs are needed for these examples. Both Complex and QInt have cases that can freely be converted back and forth: Cartesian/polar and little-endian/big-endian are used to represent the same values, just the encodings are different. So here you could pick one encoding and provide functions to convert between them into the one that is more convenient at the time. In contrast, Some('T) and None represent different values that in general can't be converted and need to be handled separately.

A multi-case DU where all the case types are Unit, a.k.a. an enum. Could also potentially omit parens.

I think this would imply that Monday() in a type declaration would create a value of type Unit -> Weekday, and if you declared it as Monday instead, you would create a value of type Weekday. However, I don't think we allow non-callable values as namespace elements currently (which I guess is why PI() is a function instead of a constant). I definitely would also like to see fewer unnecessary parentheses though. :) So maybe we could in general allow non-callable types in namespaces?

The design here also allows for type parameterization. the type parameter 'T can be any other valid type in Q#, see the docs for more info.

Currently, Q# doesn't support UDTs with type parameters. I think that's a whole big feature on its own. :) Not having them would severely limit discriminated unions, though, since we would not even be able to represent a maybe type. On the other hand, without it we could still bring enums to Q# in a way that could be extended into proper sum types if type parameterized types are added later. So I'm not sure which order makes the most sense for developing them.

// Could be used to indicate that some calculation did not succeed. 
let return = None<String>();

// For constructors isomorphic to `Unit`, parens could be omitted.
let return = None<String>;

I think we should be consistent with callables here - for functions and operations these expressions would be different types. However, as mentioned above I think it makes sense to have the choice of parentheses or no parentheses at the point of declaration, if we allow non-callable types as namespace elements.

let x = match Foo(8) {
    // It's OK to omit <Int> here since that's
    // known exactly from the type of Foo(8).
    Some(_) -> "Even",
    None -> "Odd"
}

I like the match syntax! For consistency with if, for, etc., match (Foo(8)) { ... } might be better though.

crazy4pi314 commented 4 years ago

Sorry for the delay, but thanks for the feedback @samarsha!❤️ I have responded to some of your comments below, and maybe once others chime in I can refine the proposal again.

If newtype is meant to be a general data type, then I would still suggest using a different keyword like type for simplicity and to avoid confusing Haskell people. :)

That's fair, I haven't really done Haskell before so I avoided this confusion :) I guess the reason I was using newtype mainly was to keep to my design goal of this being an extension of UDTs which already uses this keyword.

I'm not sure if DUs are needed for these examples. Both Complex and QInt have cases that can freely be converted back and forth: Cartesian/polar and little-endian/big-endian are used to represent the same values, just the encodings are different.

One of the main motivators for writing this proposal was to handle things like these cases so maybe I need a different solution then :) I liked the idea of say QInt because many (but importantly not all) operations/functions in Q# take LittleEndian if the register represents a value. It seems like it would be cleaner to have this as a more descriptive type, and let users use whatever encodings they like and not have to be converting stuff all over the place.

I don't think we allow non-callable values as namespace elements currently (which I guess is why PI() is a function instead of a constant). I definitely would also like to see fewer unnecessary parentheses though. :) So maybe we could in general allow non-callable types in namespaces?

Agreed, this might just be a different suggestion and not exactly the point here for DUs.

Currently, Q# doesn't support UDTs with type parameters. I think that's a whole big feature on its own. :) Not having them would severely limit discriminated unions, though, since we would not even be able to represent a maybe type.

Heh fair, should I file an issue for that as well?

I think we should be consistent with callables here - for functions and operations these expressions would be different types.

Agreed.

For consistency with if, for, etc., match (Foo(8)) { ... } might be better though.

Yeah that makes sense, I like it!

bamarsha commented 4 years ago

That's fair, I haven't really done Haskell before so I avoided this confusion :) I guess the reason I was using newtype mainly was to keep to my design goal of this being an extension of UDTs which already uses this keyword.

Yeah, that makes sense. I'm not sure why newtype was chosen as the keyword instead of type, so I was just wondering if it was chosen specifically for its single-constructor type alias properties. Maybe someone else who knows the reason can fill us in.

One of the main motivators for writing this proposal was to handle things like these cases so maybe I need a different solution then :) I liked the idea of say QInt because many (but importantly not all) operations/functions in Q# take LittleEndian if the register represents a value. It seems like it would be cleaner to have this as a more descriptive type, and let users use whatever encodings they like and not have to be converting stuff all over the place.

I think that this would shift where the conversions happen, but I'm not sure that it would reduce the total number of conversions. For example:

operation Add(xs : LittleEndian, ys : LittleEndian) : Unit {
    // Add with xs! and ys!.
}

operation Main() : Unit {
    let x = ReturnsBigEndian();
    let y = ReturnsBigEndian();
    Add(BigEndianAsLittleEndian(x), BigEndianAsLittleEndian(y));
}

vs:

operation Add(x : QInt, y : QInt) : Unit {
    // Convert to little endian so that we can use the same algorithm for both cases.
    let xs = BigEndianAsLittleEndianArray(x);
    let ys = BigEndianAsLittleEndianArray(y);
    // Add with xs and ys.
}

operation Main() : Unit {
    let x = ReturnsBigEndian();
    let y = ReturnsBigEndian();
    Add(x, y);
}

It might also increase the total number of conversions in cases where an operation accepts a QInt, does something to it, and then passes it onto another operation that accepts a QInt. If both operations use algorithms that assume little-endianness, they would both have to convert it. If their type required LittleEndian, the conversion would happen once before the initial call.

I guess most of these conversions would be in the standard libraries, so users wouldn't have to deal with it. :) But it would still possible for users to write their own operations that accept QInt (in their own programs or in user-created libraries), and I think they would have to deal with the conversions there as well.

Please let me know if I have missed something or misunderstood what you meant, though.

Currently, Q# doesn't support UDTs with type parameters. I think that's a whole big feature on its own. :) Not having them would severely limit discriminated unions, though, since we would not even be able to represent a maybe type.

Heh fair, should I file an issue for that as well?

Sure, I would definitely support a proposal for type-parameterized types. :)

crazy4pi314 commented 4 years ago

Maybe someone else who knows the reason can fill us in.

Pinging @bettinaheim?

Please let me know if I have missed something or misunderstood what you meant, though.

I def agree that it doesn't necessarily reduce the amount of total conversions. My hope would be that all the library operations would also then not take just LittleEndian but QInt so that it reduces errors on the user side (did I convert it yet or not?) and they can just think about it as the number it represents, not the encoding. As you say users would also be able to do this as well, even if the libraries did not, and I still think it would be useful. I have been working on a library for qRAM and the number of times I have been like "wait.... is this the right encoding?" and had to check stuff manually has been too high 🤦

Sure, I would definitely support a proposal for type-parameterized types. :)

I'll add it to my queue 😄

bamarsha commented 4 years ago

My hope would be that all the library operations would also then not take just LittleEndian but QInt so that it reduces errors on the user side (did I convert it yet or not?) and they can just think about it as the number it represents, not the encoding.

This makes sense, I agree with this goal. :) This is my idea for how to do that without DUs, please let me know if it would work for you:

// A quantum integer, represented internally using little-endian (but
// this is an implementation detail - it would be better if the
// internal representation could be hidden from users).
newtype QInt = Qubit[];

// Creating QInts
function LittleEndian(qs : Qubit[]) : QInt { return QInt(qs); }

function BigEndian(qs : Qubit[]) : QInt { return QInt(Reversed(qs)); }

// Consuming QInts
function QIntAsLittleEndian(n : QInt) : Qubit[] { return n!; }

function QIntAsBigEndian(n : QInt) : Qubit[] { return Reversed(n!); }

It's supported by the current Q# language, but IMO it would be better if Q# allowed opaque UDTs to be defined (i.e., UDTs where the underlying type is only visible from within a module/project, rather than public), so that users wouldn't be able to tell if QInt used little-endian or big-endian internally.

msoeken commented 4 years ago

In Examples there are some samples of how DUs can improve common Q# patterns and usage.

Different implementations of the same functionality can also benefit from DUs for a cleaner API. I have recently added the operation ApplyPermutationUsingTransformation to the Q# libraries and am planning to add another operation ApplyPermutationUsingDecomposition soon. They both apply a permutation to the state vector, but use different implementations. Without DUs, they'll be distinguished by their operation name, but using DUs one could pass a parameter to distinguish them. Further, some implementations may have dedicated parameters that are not available for others. These can be part of the corresponding DU.

bamarsha commented 4 years ago

Thanks for the example @msoeken! Is this also an example of a DU that would still be useful without type parameters (in case DUs are implemented before type-parameterized types)?

cgranade commented 4 years ago

@samarsha: I think @msoeken's example as well as similar examples across arithmetic work great without type-parameterized UDTs (I definitely would love to have Maybe<'T>, though...!). In particular, we have a fairly common pattern where different algorithms and methods for doing the same thing may offer different trade-offs, such that a user may want to select an implementation based on those trade-offs. That's more than an enum of Unit cases, however, since some methods may have different options than others, such that I think this would be a really great place to have DUs.

cgranade commented 4 years ago

To add another use case that we've run into quite a bit, the current implementation of DumpMachine has a type parameter 'T that is used by the C# implementation of that callable; the C# implementation uses runtime type metadata to enforce that 'T is either Unit or String and causes a runtime failure if that is not the case.

As an alternative, the location input could be a Maybe<String>; even without type-parameterized DUs, it would be much nicer to have a concrete DU to represent where DumpMachine should dump its diagnostics to:

newtype DumpLocation =
    /// # Summary
    /// The dump should be written to the current diagnostic display channel
    /// in the host program, if any.
    | Display()
    /// # Summary
    /// The host should write the dump to a file with a given name in its local filesystem.
    | File(Name: String)
    /// # Summary
    /// The dump should be written to the current diagnostic display channel,
    /// but with a given ID that can be used to look up the dump from within
    /// the host.
    | NamedDisplay(Id: String);

operation DumpMachine(location : DumpLocation) { body intrinsic; }

// ...

DumpMachine(Display());
DumpMachine(File("a.txt"));
DumpMachine(NamedDisplay("foo"));
cgranade commented 3 years ago

One possible modification that I could suggest to this proposal is one that's come up in a few recent API discussions with @guenp and @msoeken, namely of anonymous discriminated unions. In TypeScript, for instance, the type string | number can be either a string or a number (Python has a similar feature using the notation Union[str, float]). That can be useful for cases that look like "overloading" in C-style languages, such as allowing PrepareAribtraryState to take either an array of Double values representing positive coefficients, or an array of complex coefficients:

operation PrepareArbitraryState(coefficients : (Double[] | Complex[] | ComplexPolar[]), target : Qubit[]) : Unit is Adj + Ctl {
    // ...
}

To use a value of such an anonymous union type, I'd suggest requiring that a developer first resolves the exact type using a match expression:

let complexCoefficients = coefficients match {
    realCoefficients : Double[] ->
         // In this arm, realCoefficients is a resolution of the anonymous DU
         // to have the concrete type Double[] instead.
         Mapped(Complex(_, 0.0), positiveCoefficients),
     alreadyComplexCoefficients : Complex[] -> alreadyComplexCoefficients,
     complexPolarCoefficients : ComplexPolar[] ->
         Mapped(ComplexPolarAsComplex, complexPolarCoefficients)
      // The match expression is exhaustive here, such that no blank pattern
      // is needed in order to guarantee that complexCoefficients has
      // a value of type Complex[] in all cases.
};

Adding a feature of this form wouldn't address all the usecases for DUs (e.g.: since file names and display names for DumpMachine locations are both of type String, labeled cases are still really helpful), but could possibly make DUs lighter-weight to use when the interpretation of each branch of a DU is obvious from context.

bettinaheim commented 3 years ago

@crazy4pi314 Seeing as this is now tracked on the language repo, I will close the issue here, but reference the conversations in the issue on the language repo.