Closed Baccata closed 2 years ago
This is a great topic. I was just recently talking about this with the Smithy team :) I'll collect my thoughts here so we can discuss.
The main driver for me for making enums a shape is primarily so that each enum value can be a member and have a dedicated shape ID, traits associated with them, filtered out of models just like other shapes, etc. Right now they’re almost like members with a very limited set of properties. There are of course lots of trade offs and complexities with this approach.
For me, enums with ordinal values isn't that important. I think representing enums as numbers is worse than strings, especially for web services and debugability, which I'll touch on later.
Reconciling members with simple shapes
First off, the member/aggregate type problem. Enums are serialized as and treated like simple types in probably every protocol, and they almost always have some kind of scalar like value associated with them in PLs (a number, a constant, etc). We'd want an enum in Smithy to be considered a simple type, but it would still have members. I think we'd need to make member shapes distinct from its current aggregate shape classification, and also redefine aggregate shapes as shapes that can contain one or more values. Aggregate shapes are currently defined around whether they have members.
Targeting a shape from enum members
Enum members would also need a shape to target. The work I’m doing with unit types in #980 could mean that the members target Unit, which would be fine— it would give it the same form as every other member without needing to target a meaningful type. We’d hide this in the IDL. Interestingly, unions could technically function as an enum if every member targets the unit type, but that's a degenerate case and not explicit enough.
Representing unknown enum values in code
Next up, we have to consider service evolution in a client/server interaction. Servers are going to need to add more enums in the future (and IME, usually when someone thinks their enum will never need to change in the future, they're wrong). So regardless of if we supported ordinal based enums in addition to string based enums, they both need to decompose down to simple values like strings and integers so that a client that receives a newly introduced enum value it doesn't recognize can still use the value or even send the value to the service without failing at deserialization time. If you look at the current set of Smithy code generators, they all allow enums to be passed around as strings or as more PL-like enum types.
That's actually one of the reasons enums were only a constraint trait on string shapes. It is a string shape, but a specialized string shape with known constant values. This is similar to how many languages can treat enums as a subtype of integer, but Smithy's enum is just a string today. Implementations need to have a way to represent unknown enums and to accept raw values in place of enums.
Enum serialization
Strings were also chosen in Smithy because using a string have a major advantage over ordinals in that they are human-readable and have meaning independent of the model. This improves debugability, wire logs, things like CloudTrail logs, error messages, and so on. They're human-readable without a reverse mapping. The downside is that they take up more space in memory than a more efficient int based enum.
Ordinal enums make sense when you're only dealing with types in code, but when you start sending them over the wire, strings are superior. (As an AWS API Bar Raiser, I'd be hesitant to allow an AWS service team to define an ordinal based enum).
Default values
Enums have to have a default value to make the default trait proposal work (see #920). With enum strings right now, the default value is an empty string, which is fine because (a) an enum could define an explicit empty value (b) implementations need to handle unknown values anyways. With an ordinal enum, the default would have to be 0.
Add an intEnum
trait?
If we really wanted to add support for ordinal enums, then I think there's a reasonable argument for adding a new trait just like the existing enum trait that can only target an integer shape. This gives the same properties in that it would be a specialized integer, implementations need to support unknown types as integers, etc, but it has the same drawbacks in that they don't have real member shapes.
Add a built-in enum
type that can be strings or numbers
Another option could be to add an enum
shape that is either a string or number depending on the protocol. That sounds reasonable at first, but I don't see how this could work in Smithy because code generators need to generate types based on shapes independent of the protocol, and if the shape has no known decomposed type, then we don't have a good way of handling unknown enum values. We really need to know if an enum contains constant strings or constant integers.
Extend enum and intEnum
from string and integer
We could also add an enum
type that extends string (meaning, anything that supports strings also inherently supports enums, like the httpHeader
trait for example); and we could also add an intEnum
type that extends from integer. Both would be simple types, decompose to specific types (string/integer), and they'd have real members.
For example, an enum
string (this could also be an automatic conversion when upscaling 1.0 models to 2.0):
enum Action {
@enumValue("move") // <-- optional string value
MOVE
QUIT // <-- string value defaults to "QUIT"
}
An intEnum
:
intEnum Action {
@enumOrdinal(0) // <-- Explicit ordinals are required
MOVE
@enumOrdinal(1)
QUIT
}
We'd have to require explicit ordinals because members can be added and removed from models based on filtering.
Ok, those are my opening thoughts for now. I'm very interested to hear yours.
Lots of interesting thoughts there.
The main driver for me for making enums a shape is primarily so that each enum value can be a member and have a dedicated shape ID, traits associated with them, filtered out of models just like other shapes, etc.
👍 That is absolutely a pain-point experienced in one of the the usecases for smithy. Any solution targeting it would be great.
Enum members would also need a shape to target. The work I’m doing with unit types in #980 could mean that the members target Unit, which would be fine— it would give it the same form as every other member without needing to target a meaningful type. We’d hide this in the IDL. Interestingly, unions could technically function as an enum if every member targets the unit type, but that's a degenerate case and not explicit enough.
So interestingly, that's exactly how Scala 3 model enumerations. As a matter of fact, Scala 3 introduces the enum
keywords to define algebraic data types (which unions are somewhat related to), and enumerations are just a specific case.
Still interestingly (but way more anecdotal), the absence of input/output translating to Unit
is exactly how I've modelled things in my tooling. Having it reified in the IDL would be amazing.
I do agree that having enumerations be their own thing is the right approach : people are used to it being the case, and it prevents implementors of tooling from having to do a little dance to verify whether they are dealing with actual enumerations or plain unions.
Implementations need to have a way to represent unknown enums and to accept raw values in place of enums.
I think that's a little bit of a strawman : the ability for an enumeration to receive more values is protocol dependant, and is conceptually similar to what the default
trait solves for. My take on it, since I'm building tooling that is variance-aware (ie computes compatibility based on the position of shapes in inputs/outputs of operations), is that an unknown enum value would result in an error. If that is true in the protocols that I'm defining, I totally understand your position on the matter.
Similarly : whether enums have a default value is, I think, usecase/protocol dependant (I'm gonna avoid digressing on the subject considering we've already discussed it in length).
BTW : regarding variance-based compatibility rules, I've got this write-up on the subject.
Strings were also chosen in Smithy because using a string have a major advantage over ordinals in that they are human-readable and have meaning independent of the model
I totally agree there, in the context of protocols where human-readable serialisation formats are used. But smithy aiming at being protocol-agnostic implies that the problematic of serialisation needs to be decoupled from the problematic of data modelling (at least, to an extent)
Add an intEnum trait?
I'm not in favour of this, for the reason stated in my previous paragraph. I prefer approaching the problem as follows : how can we make it so that the concept of enumeration is (to an extent) decoupled from serialisation (or at least, appears to be decoupled from it). To use an analogy : I really like that the concept of timestamp
is first class in smithy, because it is up to the protocol to state how timestamps should be encoded. If you had split the timestamp
between two separate date-time
and epoch
, I'd have found it weird.
Extend enum and intEnum from string and integer
enum Action {
@enumValue("move") // <-- optional string value
MOVE
QUIT // <-- string value defaults to "QUIT"
}
intEnum Action {
@enumOrdinal(0) // <-- Explicit ordinals are required
MOVE
@enumOrdinal(1)
QUIT
}
I agree with this idea whole-fully. Firstly the syntax is great an intuitive, but also it addresses my concern, and an "implicit relationship" between enums and strings/integers when used in combination with protocol-specific annotations is useful and pragmatic. It also solves my concern of offering a decoupling between enumerations and serialisation, even if only in appearances.
First-class enum shapes were released yesterday in IDL 2.0 (1.23 release): https://aws.amazon.com/blogs/developer/introducing-smithy-idl-2-0/
It might be a pretty bad timing to even consider it for 2.0 of the IDL, but a few engineers I've talked to feel it'd be a lot more natural if enumerations were a first-class shape as opposed to a constraint on strings.
The main rationale for this ask is that the encoding of enumerations in various protocols is often not-string based, but rather ordinal-based.
Another example is protobuf, where enumerations are encoded as integers. As a matter of fact, my team wants to generate proto files from smithy definitions in order to have the "source of truth" be written in smithy, in order to benefit from the smithy tooling.
Smithy enforcing enumerations to be strings feels like an openapi-ism, and forces some tools that do not consider enumerations to be strings to essentially violate the smithy semantics, which isn't ideal.