microsoft / qsharp-language

Official repository for design of the quantum programming language Q# and its core libraries
MIT License
235 stars 56 forks source link

Discriminated Unions #51

Open crazy4pi314 opened 3 years ago

crazy4pi314 commented 3 years ago

Suggestion

This suggestion outlines how to add discriminated unions to Q#. The overall philosophy/objective with this design is to use it to extend/generalize the current design of user defined types (UDTs). It may not break current UDT syntax, and current UDTs should be thought of as single-case DUs.

NB: This was originally proposed on the compiler repo, and there is a lot of existing discussion there, most of which has been incorporated here. Thanks to @samarsha, @cgranade, @msoeken and @guenp for input to this proposal 💖

Considerations

Firstly, discriminated unions (DU), or tagged unions, are a fairly common language feature for function languages and will allow users of Q# to leverage coding patterns they are already familiar with. Scala calls DUs case classes, Rust has DUs called enums, Haskell has sum types, and F# calls them DUs. This gives us a lot of examples to draw from and evaluate to create the right feature for Q#.

Secondly, DUs can make code more robust as they allow you to use the type system to reduce bugs or unhandled edge cases. In Examples there are some samples of how DUs can improve common Q# patterns and usage. The main patterns I have run into where they might be useful fall into the following categories:

If this is implemented in a way that allows all current UDT syntax to be valid, then this shouldn't be a breaking change and should just be strictly a feature add.

It is not clear to me there is another way to achieve this functionality other than creating a UDT where each field is a valid case/tag and you would use if statements to filter on the populated field manually. This is similar to how I have had to implement what here would be a Maybe DU where I used a tuple (Bool, (Int, Int)) as a return type signature (see this issue). There are certainly other designs for how to add DUs to Q#, but this proposal focuses on making it a non-breaking change by extending current UDTs.

For the formatting applications, @samarsha proposed the following as an alternate solution for the example of endianess that can be used right now in Q#:

// A quantum integer, represented internally using little-endian (but
// this is an implementation detail - it would be better if the
// internal representation could be hidden from users).
newtype QInt = Qubit[];

// Creating QInts
function LittleEndian(qs : Qubit[]) : QInt { return QInt(qs); }

function BigEndian(qs : Qubit[]) : QInt { return QInt(Reversed(qs)); }

// Consuming QInts
function QIntAsLittleEndian(n : QInt) : Qubit[] { return n!; }

function QIntAsBigEndian(n : QInt) : Qubit[] { return Reversed(n!); }

It's supported by the current Q# language, but IMO it would be better if Q# allowed opaque UDTs to be defined (i.e., UDTs where the underlying type is only visible from within a module/project, rather than public), so that users wouldn't be able to tell if QInt used little-endian or big-endian internally.

This would definitely work, but I think as Sarah notes, that it doesn't really achieve the conceptual abstraction and the user will still have to track some conversions.

Context

The most important feature of this design approach is that it feels natural with the current implementation for UDTs, and allows for all current UDT syntax to be valid.

DUs have the following components:

  1. The name of the DU type itself/the declaration syntax*.
  2. Cases (sometime implemented with tags) that identify which part of the union the value is in.
  3. Constructor for values of each case.
  4. Destructor syntax that allows you to match on which case a value is in.

* DUs could also include anonymous DUs as detailed by @cgranade, @guenp and @msoken here and at end of examples.

The following section goes into much more detail on what each of these parts could look like for Q#. Much of the suggested syntax here comes from the examples that I looked in this reference on F#. As noted above though, there are many languages that have similar features, but the F# syntax seemed to match most closely with established patterns in Q#.

Examples

The three main syntactic things that need prototyped to extend UDTs to DUs:

  1. how to declare a new DU type,
  2. what case constructors look like,
  3. and how to destruct the different types in the union.

To that end, the newtype keyword would be used to declare new DUs as well.

@samarsha notes that:

"In Haskell, newtype is used to declare strongly-typed type aliases that are limited to a single constructor. As a result, they can be erased at runtime. I am not sure if that was also part of the design of newtype in Q#. If Q# is making a distinction between types that can be erased at runtime and types that can't, then we can't use newtype for multi-constructor types. If newtype is meant to be a general data type, then I would still suggest using a different keyword like type for simplicity and to avoid confusing Haskell people. :)"

The following sections give example suggested designs for the three syntaxes listed above.

DU declarations

Since the goal is to make this feel natural with the current implementation for UDTs, here is a quick review of that syntax.

newtype Complex = (Re : Double, Im : Double);

You can see it uses the statement newtype to declare a new UDT, and there are named items in the defininition itself. For more details and examples see the docs.

The following examples would all be valid syntax for this design.

Example 1:
Standard UDT syntax where the constructor function implicitly will have the same name as the type.

newtype Foo = (Bar : Baz);

Example 2:
Standard UDT syntax where the constructor function is explicit.

newtype Foo = Foo(Bar : Baz);

Example 3:
A two case DU where there is no leading bar.

newtype MaybeBaz =
    | Some(Bar : Baz)
    | None();

Example 4:
A two case DU where a complex number can have one of two different coordinate systems.

newtype Complex = 
    Cartesian(Re : Double, Im : Double)
    | Polar(Double, Double)

Example 5:
Standard UDT syntax where the constructor function is explicit, and has a leading bar.

Note: probably should be valid for consistency, but not approved by the style guide

newtype Foo = | Foo(Bar : Baz);

Example 6:
Standard UDT syntax where the constructor function is implicit, and has a leading bar.

Note: probably should be valid for consistency, but not approved by the style guide

newtype Foo = | (Bar : Baz);

Example 7:
A two case DU where there is no leading bar.

newtype QInt =
    LittleEndian(Qubit[]) |
    BigEndian(Qubit[]);

Example 8:
A multi-case DU where all the case types are Unit, a.k.a. an enum. Could also potentially omit parens.

newtype Weekday =
    | Monday()
    | Tuesday()
    | Wednesday()
    | Thursday()
    | Friday();

Example 9:
A multi case DU that generalizes implementation options for an adder.

newtype AdderMethod = 
    | CDKM()
    | Draper(LowDepth : Bool)
    | TTK();

The design here also allows for type parameterization (see this proposal for parameterized UDTs). This would significantly expand what whe can do with DUs, in particular create a Maybe DU as shown in this example. The type parameter 'T can be any other valid type in Q#, see the docs for more info.

Example 10:
A parameterized two case DU where there is a leading bar.

newtype Maybe<'T> =
    | Some('T)
    | None();

Case constructors

You have seen some of the syntax for the case constructors already, they are the bits similar to function declaration syntax (i.e. Some(Int)). One way to think about the deign here is that you are defining a collection of UDT constructors, one for each case. You are declaring cases by declaring their constructors. That means that the case constructor functions will be declared as globals in the same namespace that the DU is declared in. As a consequence of that, then you will not be able to have DUs with case constructors that have the same fully qualified name. I don't think at the moment that is a problem as it follows kind of the same pattern as you cannot have a UDT and a function share the same name in a namespace.

Examples of valid case constructor syntax:

Example 1:
Given the Maybe<'T> declaration, declaring a case that returns a type of Maybe<Int>.

newtype Maybe<'T> =
    | Some('T)
    | None();

// Has type Maybe<Int>
let x = Some(42);

// Has type String -> Maybe<String>
let maybeString = Some<String>;

Example 2:
Constructs a register that represents an integer, that can be interpreted as either little or big endian.

newtype QInt =
    | LittleEndian (Register : Qubit[])
    | BigEndian (Register : Qubit[]);

using (qubits = Qubit[3]) {
   let target = LittleEndian(qubits);
   // ...
}

Example 3:
Case constructor that has type Unit.

newtype Maybe<'T> =
    | Some('T)
    | None();

// Could be used to indicate that some calculation did not succeed. 
let return = None<String>();

// For constructors isomorphic to `Unit`, parens could be omitted.
let return = None<String>;

Case deconstructors

For the deconstructor here, an expression design could make a lot of sense. That allows you to set a variable equal to the result of a deconstruction statement. See the following examples of how this could work.

Example 1:
Example 3 from the previous section, and we use match to deconstruct the cases possible for Maybe<'T>.

newtype Maybe<'T> =
    | Some('T)
    | None();

function Foo(bar : Int) : Maybe<Int> {
    if (Int % 2 == 0) {
        return Some(bar/2);
    } else {
        // Would be nice to not have to specify <Int> here but that's not necessary for the DU design.
        return None<Int>;
    }
}

// Could be used to indicate that some calculation did not succeed. H/t to @samarsha for the suggestion [here](https://github.com/microsoft/qsharp-compiler/issues/406#issuecomment-626110376).
let x = match (Foo(8)) {
    // It's OK to omit <Int> here since that's
    // known exactly from the type of Foo(8).
    Some(_) -> "Even",
    None -> "Odd"
}

Use case: Library simplification

Via @cgranade in previous thread.

...we have a fairly common pattern where different algorithms and methods for doing the same thing may offer different trade-offs, such that a user may want to select an implementation based on those trade-offs. That's more than an enum of Unit cases, however, since some methods may have different options than others, such that I think this would be a really great place to have DUs.

From @msoeken in previous thread, I can maybe stub an example for an enum of Unit cases:

newtype PermuteMethod = 
    | Transformation()
    | Decomposition()
    | SomeOtherMethod()

operation ApplyPermutation(method : PermuteMethod, ...) Unit is Adj + Ctl {
    ...
}

Use case: Anonymous DUs

Detailed by @cgranade, @guenp and @msoken here:

One possible modification that I could suggest to this proposal is one that's come up in a few recent API discussions with @guenp and @msoeken, namely of anonymous discriminated unions. In TypeScript, for instance, the type string | number can be either a string or a number (Python has a similar feature using the notation Union[str, float]). That can be useful for cases that look like "overloading" in C-style languages, such as allowing PrepareAribtraryState to take either an array of Double values representing positive coefficients, or an array of complex coefficients:

operation PrepareArbitraryState(coefficients : (Double[] | Complex[] | ComplexPolar[]), target : Qubit[]) : Unit is Adj + Ctl {
    // ...
}

To use a value of such an anonymous union type, I'd suggest requiring that a developer first resolves the exact type using a match expression:

let complexCoefficients = coefficients match {
    realCoefficients : Double[] ->
         // In this arm, realCoefficients is a resolution of the anonymous DU
         // to have the concrete type Double[] instead.
         Mapped(Complex(_, 0.0), positiveCoefficients),
     alreadyComplexCoefficients : Complex[] -> alreadyComplexCoefficients,
     complexPolarCoefficients : ComplexPolar[] ->
         Mapped(ComplexPolarAsComplex, complexPolarCoefficients)
      // The match expression is exhaustive here, such that no blank pattern
      // is needed in order to guarantee that complexCoefficients has
      // a value of type Complex[] in all cases.
};

Adding a feature of this form wouldn't address all the usecases for DUs (e.g.: since file names and display names for DumpMachine locations are both of type String, labeled cases are still really helpful), but could possibly make DUs lighter-weight to use when the interpretation of each branch of a DU is obvious from context.

Affidavit (please fill out)

Please add ticks by placing a cross in the box:

Please tick all that apply:

bettinaheim commented 3 years ago

See also the discussion on https://github.com/microsoft/qsharp-compiler/issues/406, where the issue was originally filed.

bettinaheim commented 3 years ago

@crazy4pi314 Thanks for capturing all this - this is great! I've got a couple of questions/comments:

I am looking forward to the proposal on type parameterized udts ;).

You mention LittleEndian or BigEndian as potential use cases of the same DU if I understand correctly. I am a bit skeptical about that particular example, since that would basically force that each callable that takes such an argument is implemented for both. I don't think that is necessary and see more value in having a library that uniformly supports just one of the two (e.g. LittleEndian) and indicates when a callable depends on this interpretation by consuming explicitly a LittleEndian.

"This is similar to how I have had to implement what here would be a Maybe DU" Is the code open source by any chance and do you have a link?

I am not sure it should be allowed that a case has the same name as the/a type, and in particular not if "[...] the case constructor functions will be declared as globals in the same namespace that the DU is declared in." Even if current udts are considered to be single case DUs as it is currently suggested the case could have a fixed and reserved default name. I am curious regarding what the motivation is for wanting to treat all custom types as DUs? What might be worth exploring is how DUs where all cases take an argument of type Unit relate to enum-like constructs, such as the built in Pauli and Result type.

The large number of ways to express a DU declaration is worrying me a bit. I was wondering is the first | shouldn't just be required. It is also worth discussing support for custom constructor under impact with future mechanism or related mechanisms, since the syntax will need to be carefully aligned with custom constructors in general.

The text under Example 3 for declarations I assume should go with Example 4. As a nitpick: In Example 3 for case constructors, declaring a variable with name return isn't supported since return is a reserved keyword.

There is a lot to dive into for the match statement or expression suggested for case deconstruction, and I think it would make sense to potentially consider it as a separate proposal with DUs being a related future modification to keep in mind when working that out. Similarly, I'd want to make anonymous DUs a separate proposal that can be considered under interactions with future mechanisms.

My suggestion for how to proceed is to first focus on the match statement, since it is the piece that can be introduced independently, and I believe will be required to move forward with this. What do you think?

cgranade commented 3 years ago

As a side note, it looks like anonymous DUs are on track for possible inclusion in F# with FS 1092: https://github.com/fsharp/fslang-design/pull/512! 💕

martinquantum commented 2 years ago

I'd like to emphasize the importance of this feature request to add discriminated unions to the language as this would increase programming safety. Concrete case in point: In a project by a customer at https://github.com/npbauman/COVOs-Azure/blob/guenp/vqe-nb/VQE%20for%20LiH%20-%20debugging-Copy1.ipynb

operation PrepareUCCSDAnsatz(JWEncodedData: JordanWignerEncodingData, theta1: Double, theta2: Double, theta3: Double, register: Qubit[]) : Unit is Adj {

    let (nSpinOrbitals, fermionTermData, inputState, energyOffset) = JWEncodedData!;
    let (stateType, JWInputStates) = inputState;
    let inputStateParam = (
        3, // Prepare the UCCSD state
        [
            JordanWignerInputState((theta1, 0.0), [2, 0]),
            JordanWignerInputState((theta2, 0.0), [3, 1]),
            JordanWignerInputState((theta3, 0.0), [2, 3, 1, 0]),
            JWInputStates[0]
        ]
    );
    Message($"{inputStateParam}");
    _PrepareTrialState(inputStateParam, register);
}

there was a bug in the definition of the first component of inputStateParam that was difficult to trouble shoot and that eventually could be traced down to an incorrect dispatch on numerical values (namely 1, 2, 3 of int type), depending on a VQE chemistry ansatz chosen, which likely would have been prevented by having a discriminated union type, say e.g. (fci-ansatz | fci-sparse-ansatz | qcc-ansatz).

As choosing an incorrect value for the ansatz did not lead to a compilation or run-time error but rather in incorrect final energies, debugging of this bug took a long time and involved several people. This might have been avoided if the dispatch could have been done on matching the values of a union type. @cgranade @guenp @crazy4pi314 @bettinaheim @efratshabtai

roguetrainer commented 1 year ago

Plus one for DUs! 💯 In F# they are one of the key features to make coding such a safe, parsimonious & pleasant experience. So pleased to see this being added to Q#. Bravo!