winglang / wing

A programming language for the cloud ☁️ A unified programming model, combining infrastructure and runtime code into one language ⚑
https://winglang.io
Other
4.78k stars 189 forks source link

Algebraic data types / union-like classes #977

Open Chriscbr opened 1 year ago

Chriscbr commented 1 year ago

Summary

No response

Feature Spec

As a Wing user, I would like to be able to express enums where each choice may have one or more associated fields.

Examples:

// Wing
enum Frequency {
  Cron(str),
  Rate(duration),
}

enum ArithmeticExpression {
    Number(num)
    Addition(ArithmeticExpression, ArithmeticExpression)
    Multiplication(ArithmeticExpression, ArithmeticExpression)
}

enum MatchAssertion {
  Absent,
  AnyValue,
  ContainsPattern(str),
}

Such an enum could be used like so:

// Wing
let x = Frequency.Cron("0 0 * * *"); // typed as Frequency
let kind = nameof(x); // "Cron"

let message: str = switch x {
  Cron(_) -> "Running with a cron schedule.",
  Rate(d) -> "Running every ${d.minutes} minutes",
};

Since a switch statement is the only control flow that lets you safely unwrap an enum, it is the only way to extract the values of the associated fields.

The name "switch" is used to emphasize that this is primarily a control structure for enums. I think if we chose the name "match" it might confused some users as they may expect the syntax to support matching against other types like tuples, primitives, etc. like Rust (example)

The compiler would translate such an enum into an enum-like class in JavaScript:

// JavaScript
class Frequency {
  private constructor(
    public readonly discriminant: number,
    public readonly field1?: string,
    public readonly field2?: Duration
  ) {}
  public static Cron(field1: string) {
    return new Frequency(0, field1, null);
  }
  public static Rate(field2: Duration) {
    return new Frequency(1, null, field2);
  }
  public nameOf(): string {
    switch (this.discriminant) {
      case 0:
        return "Cron";
      case 1:
        return "Rate";
    }
  }
}

And the enum usage would be translated like so:

// JavaScript
let x = Frequency.Cron("0 0 * * *");
let kind = x.nameOf();

let message: string = (function(x) {
  // this could also be implemented with if/else, doesn't really matter
  switch (x.discriminant) {
    case 0:
      return "Running with a cron schedule.";
    case 1:
      return `Running every ${x.field2.minutes} minutes`
  }
})(x);

FAQ

Q: What about ordinary enums without any fields? A: Enums with no associated fields could continue to be compiled in a way so that they produce regular integer/string values. Alternatively, we could compile all enums into JavaScript classes, and give them a toString / valueOf so they play well with the rest of the JS ecosystem.

Q: How would this work with JSII? A: When compiling a Wing library into a JSII module, complex enums would be turned into an enum-like class in the JSII type system. This way, it can be used safely in other languages. To use the exported library in Wing code, the Wing compiler's jsii importer would need to recognize enum-like classes produced by Wing (specifically, these must be enum-classes where there is a "discriminant" field/property -- we can't use switch on ordinary enum-like classes from AWS CDK or CDKTF libraries). If it is such a enum-like class, then it will be imported as an enum type in Wing's type system, otherwise it would be imported as an ordinary JSII class with static methods etc.

Use Cases

See code examples above

Implementation Notes

References:

Component

Language Design

skinny85 commented 1 year ago

I like the idea.

What about adding names to the values, to keep the symmetry between enums and functions/classes?

So something like:

enum ArithmeticExpression {
    Number(value: num)
    Addition(augend: ArithmeticExpression, addend: ArithmeticExpression)
    Multiplication(multiplicand: ArithmeticExpression, multiplier: ArithmeticExpression)
}

Thoughts on this?

eladb commented 1 year ago

I love this! Is there a way to map back and forth JSII/CDK's existing enum-like (aka union-like) classes to these enums?

staycoolcall911 commented 1 year ago

This is a very cool addition to the language and will give it more power. However, since this is not in our lang spec yet (nor is switch), @Chriscbr and me discussed and believe this can wait for after Wing beta. Added to Post-MVP backlog

yoav-steinberg commented 1 year ago

Two notes:

  1. Re adding field names to the values in the enum, this might introduce extra language complexity. In rust you can either have a tuple assigned as the value or a struct assigned. In many cases you don't need names because there's just a single value (so you use the tuple format Number(num)) but in others you want something more complex so you can just use the struct syntax: Multiplication { multiplicand: T, multiplier: T },
  2. There's something inefficient in JSifying a class with all values as optional fields. Instead we should just have an any:
    // JavaScript
    class Frequency {
    private constructor(discriminant, value) {
    this.discriminant = discriminant;
    this.value = value;
    }
    public static Cron(field1 /*string*/) {
    return new Frequency(0, field1);
    }
    public static Rate(field2 /*Duration*/) {
    return new Frequency(1, field2);
    }
    }
Chriscbr commented 1 year ago

I think the idea of adding names to values could be nice, but I also hear Yoav's point. I sense an equivalence between:

enum ArithmeticExpression {
    Number(value: num)
    Addition(augend: ArithmeticExpression, addend: ArithmeticExpression)
    Multiplication(multiplicand: ArithmeticExpression, multiplier: ArithmeticExpression)
}

and:

struct NumberValue {
  value: num
}

struct AdditionValue {
  augment: ArithmeticExpression;
  addend: ArithmeticExpression;
}

struct MultiplicationValue {
  multiplicand: ArithmeticExpression;
  multiplier: ArithmeticExpression;
}

enum ArithmeticExpression {
    Number(NumberValue)
    Addition(AdditionValue)
    Multiplication(MultiplicationValue)
}

I think if there are a lot of values a user wants to add, then the minimal syntax (without labels) might encourage folks to just create their own struct type, which can be used outside of the context of the enum:

// bad
enum Something {
  Variant1(optionA: str, optionB: bool, optionC: num, optionD: str, optionE: str)
  Variant2(...)
}

// good
struct Stuff {
  optionA: str;
  optionB: bool;
  optionC: num;
  optionD: str;
  optionE: str;
}

enum Something {
  Variant1(Stuff)
  Variant2(...)
}

Also @yoav-steinberg good suggestion for simplifying the implementation πŸ‘

Chriscbr commented 1 year ago

I love this! Is there a way to map back and forth JSII/CDK's existing enum-like (aka union-like) classes to these enums?

It's possible there's some way to provide a partial mapping (at least for "creating" variants of union-like classes) -- the issue I see is that a lot of existing union-like classes in the CDK don't offer a way to discriminate between variants. For example, given an instance of lambda.Code I can't really tell whether it's an fromAsset or fromInline variant etc. The only way you can get distinguishing behavior is by calling "bind" on it (a custom method the class has defined). So these kinds of enums could not be used in switch / match statements.

In other words, the CDK uses union-like classes often as a way of combining data and behavior. But I think it might be cleaner/healthier if we define a union-like feature in Wing as purely a data type, and reserve "combining behavior and data" to resources and classes.

skinny85 commented 1 year ago

I don't love the AdditionValue + Addition variant - seems a little bolierplatey for my taste.

What about adding the concept of sealed classes and/or interfaces and/or structs to Wing? This is how Scala, Kotlin and Java implement algebraic data types. This way, you don't need a special enum concept in your language, you can just use sealed plus the other concepts.

Example:

sealed class ArithmeticExpression {}

struct NumberExpression extends ArithmeticExpression {
  value: num;
}

struct AdditionExpression extends ArithmeticExpression {
  augment: ArithmeticExpression;
  addend: ArithmeticExpression;
}

struct MultiplicationExpression extends ArithmeticExpression {
  multiplicand: ArithmeticExpression;
  multiplier: ArithmeticExpression;
}

Thoughts on this?

Chriscbr commented 1 year ago

Interesting - TIL about sealed classes. πŸ™‚

I'm slightly biased towards the five-line version of AdditionExpression in my original proposal because of my time trying Rust (where the syntax is inspired), but it might just be personal taste. I also pause a bit at making everything class-like, as I've heard it mentioned as a criticism of Java (e.g. in your example, is it a good idea to allow building hierarchies like FloatAdditionExpr > AdditionExpr > ArithmeticExpr?) But I'm curious to hear others' perspectives.

skinny85 commented 1 year ago

e.g. in your example, is it a good idea to allow building hierarchies like FloatAdditionExpr > AdditionExpr > ArithmeticExpr?)

Note that you can decide to forbid that by making AdditionExpr final, which Wing already supports.

github-actions[bot] commented 1 year ago

Hi,

This issue hasn't seen activity in 60 days. Therefore, we are marking this issue as stale for now. It will be closed after 7 days. Feel free to re-open this issue when there's an update or relevant information to be added. Thanks!

eladb commented 1 year ago

Keep

github-actions[bot] commented 1 year ago

Hi,

This issue hasn't seen activity in 60 days. Therefore, we are marking this issue as stale for now. It will be closed after 7 days. Feel free to re-open this issue when there's an update or relevant information to be added. Thanks!

Chriscbr commented 12 months ago

After some discussion with @staycoolcall911 - thought I'd write up a short motivation for the issue since ADTs might be unfamiliar to some folks, and the Wikipedia article isn't necessarily the best intro.

The utility of ADTs is tied to a useful principle for avoiding a broad class of software bugs, which is to make invalid states unrepresentable. Suppose I want to represent the state of a network operation. Let's say that the state can either be "loading", "failure", or "success". If it failed, there will be an error code associated with it, and if it succeeded, there will be a result message associated with it. One way to represent this in Wing is to use a struct like this:

struct NetworkState {
  state: str;
  code: num?;
  result: str?;
}

let handleState = (state: NetworkState): Response => {
  if state.state == "loading" {
    log("loading...");
  } else if state.state == "success" {
    log("success: ${state.result ?? "<error>"}");
  } else if state.state == "failure" {
    log("failure code: ${state.code ?? 0}");
  }
};

There's a couple of glaring issues with this code.

The first issue is that no matter of what the network state is, it's still possible for me to access to the "code" and "result" fields, even though they shouldn't be accessed. If this is a struct I'm exposing publicly in a library, maybe I'd document a field like code with a comment like "code contains a value when the state is failure, and no value otherwise"... but I'm basically putting more work on the consumer.

Another issue is that no matter what the state is, I still have to unwrap the code and result fields since they're typed as optionals. For brevity I used the ?? operator to provide a default, but a slightly longer (and safer) implementation would be to write this:

let handleState = (state: NetworkState): Response => {
  if state.state == "loading" {
    log("loading...");
  } else if state.state == "success" {
    if let res = state.result {
      log("success: ${res}");
    } else {
      throw("invalid network state");
    }
  } else if state.state == "failure" {
    if let code = state.code {
      log("failure code: ${code}");
    } else {
      throw("invalid network state");
    }
  }
};

But the largest issue is that, as given, the struct lets you represent invalid states.

let s1 = NetworkState {
  state: "success",
  code: 404 // the success state cannot have an error code
};

One way to address this is to model NetworkState as a class:

class NetworkState {
  static loading(): NetworkState {
    new NetworkState("loading", nil, nil);
  }
  static success(message: str): NetworkState {
    new NetworkState("success", message, nil);
  }
  static failure(code: num): NetworkState {
    new NetworkState("failure", nil, code);
  }
  _state: str;
  _message: str?;
  _code: str?;
  init(state: str, message: str?, code: str?) {
    this._state = state;
    this._message = message;
    this._code = code;
  }
  state(): str {
    return this._state;
  }
  message(): str {
    if let message = this._message {
      return message;
    } else {
      throw("cannot access message in state " + this._state);
    }
  }
  code(): num {
    if let code = this._code {
      return code;
    } else {
      throw("cannot access code in state " + this._state);
    }
  }
}

// example usage
let s1 = NetworkState.failure(404);
assert(s1.state() == "failure");
assert(s1.code() == 404);

This does adequately address the main problem, as it's no longer possible to represent invalid states inside instances of NetworkState. But from a DX perspective, this introduces a lot of boilerplate that the author needs to write. Additionally, since NetworkState is a preflight class, it can only be created during preflight (or it can only be created in inflight if it's an inflight class). (This limitation might be lifted if we support phase-independent classes in the future).

This is sort of like how you can work around not having generics in a language by creating a separate classes for ArrayOfStr, ArrayOfNum, ArrayOfYoyos, etc. but it's not really ideal.

ADTs make it straightforward to model information where fields are mutually exclusive, avoiding the mentioned problems:

enum NetworkState {
  Loading,
  Success(str),
  Failure(num),
}

let handleState = (state: NetworkState): Response => {
  switch x {
    Loading -> { log("loading..."); },
    Success(msg) -> { log("success: ${msg}"); },
    Failure(code) -> { log("failure code: ${code}"); },
  };
};