Open rust-highfive opened 10 years ago
cc #402, #514, #1154
What's the state of this?
Compared to tuples, anonymous enums would become increasingly tedious to use since a match statement would have N^2
pipe (|
) characters. At the expense of type inference, it may be better to go with a syntax like:
let foo: enum(String, int, int) = enum 2(666);
match foo {
enum 0(s) => println!("string in first: {:?}", s),
enum 1(n) => println!("int in second: {:?}", n),
enum 2(m) => println!("int in third: {:?}", m),
}
The syntax would be compatible with a future extension that allows enums to be declared with named choices:
let foo: enum { Red(String), Green(int), Blue(int) } = enum Blue(666);
match foo {
enum Red(s) => println!("string in first: {:?}", s),
enum Green(n) => println!("int in second: {:?}", n),
enum Blue(m) => println!("int in third: {:?}", m),
}
I think the feature would be more useful without allowing matching, just doing trait dispatch. I guess it's a different feature, where T|T
has T
's representation, as opposed to one bit more.
@eddyb I've been putting some thoughts into a feature like that I've posted about it on irlo : https://internals.rust-lang.org/t/pre-rfc-anonymous-enum-which-automatically-implement-forwarding-traits/4806
I'd think an Alloca<Trait>
analog of Box<Trait>
would provide the same functionality as this return enum expr
extension of -> impl Trait
idea, except there is dynamic dispatch in Alloca<Trait>
so optimization suffers.
Passing by, but if you are curious in syntax's then OCaml has anonymous sum types called Polymorphic Variants. Basically they are just a name, like `Blah, which can have optional values. An example of the syntax:
# let three = `Int 3;;
val three : [> `Int of int ] = `Int 3
# let four = `Float 4.;;
val four : [> `Float of float ] = `Float 4.
# let nan = `Not_a_number;;
val nan : [> `Not_a_number ] = `Not_a_number
# let list = [three; four; nan];;
val list : [> `Float of float | `Int of int | `Not_a_number ] list
The val
lines are the types of the let
assignments, left in to see how the typing works.
In the back-end at assembly time the names are given a globally unique integer (in the current implementation it is via hashing, a chance of collision but overall the chance is extremely low as well as warnings can be put in place to catch them), however I've seen talk of making a global registry so they just get incremented on first access efficiently.
A plain Polymorphic Variant with no data is represented internally as an integer:
`Blah
Becomes the integer 737303889
(yes I checked), and comparing those are trivial.
For Polymorphic variants that can hold data (either a single element or a tuple of elements) such as:
`Blah (42, 6.28)
Gets encoded internally as an array of two fields in assembly, the first is the above number as before, the second is the pointer to the data of the tuple (although in most cases these all get inlined into the same memory in OCaml due to inlining and optimization passes). In the typing system the above would be [>
Blah of int float ](in OCaml the types of a tuple are separated by
`).
However, about Polymorphic variants is that they can be opened or closed. Any system can pass any of them that they want, including passing through if you want. For example, a simple way to handle something like a generic event in OCaml would be like:
let f next x = match x with
| `Blah x -> do_something_with x
| `Foobar -> do_something_else ()
| unhandled -> next unhandled
Which is entirely type safe, dependent on what each function handles down the chain and all.
The big thing on the typing system is that things can be open or close typed, I.E. they either accept any amount of Polymorphic Variants or a closed set of Polymorphic Variants. If something like anonymous sum type here were to be accepted then that concept would be exceedingly useful while being very easy and very fast to statically type.
Anonymous sum types might interact with -> impl Trait
: At present, this code snippet cannot compile because the iterators have different types :
match x {
A(a) => once(a).chain(foo),
B(b) => once(bar).chain(foo).chain(b),
}
You could make this make sense with an anonymous sum type of the form impl Iterator | impl Iterator
, that itself becomes an Iterator
, but inferring any type like that sounds like chaos.
One could do it in std
with enum
s like :
enum TwoIterators<A,B> {
IterA(A),
IterB(B),
}
impl Iterator for TwoIterators where .. { .. }
so the above code becomes
match x {
A(a) => TwoIterators::IterA( once(a).chain(foo) ),
B(b) => TwoIterators::IterB( once(bar).chain(foo).chain(b) ),
}
I could imagine some enum Trait
sugar that did basically this too. You cannot delegate associated types or constants to an enum
at runtime like this, so an enum Trait
must enforce that they all agree across all the variants.
this might sound like a weird hack , but how about just making A|B sugar for 'Either', i suppose it might get even weirder to start composing A|B|C as Either<A,Either<B,C>> or have that mapping to something . What if there was some sort of general purpose 'operator overloading' in the 'type syntax' , allowing people code to experiment with various possibilities - see what gains traction
(i had yet another suggestion about allowing general purpose substitutions, e.g. type Either<A,Either<B,C>> = Any3<A,B,C> .. etc https://www.reddit.com/r/rust/comments/6n53oa/type_substitutions_specialization_idea/ now imagine recovering ~T === Box
@dobkeratops I'd rather just have a variant
style type, i.e., with variadics.
I wrote some code that could potentially fit into a library now that type macros are stable: https://gist.github.com/Sgeo/ecee21895815fb2066e3
Would people be interested in this as a crate?
I've just come upon this issue, while looking for a way to avoid having some gross code that simply doesn't want to go away (actually it's slowly increasing, started at 8 variants and passed by 9 before reaching 12):
use tokio::prelude::*;
pub enum FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
F1: Future<Item = T, Error = E>, // ...
{
Fut1(F1), // ...
}
impl<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12> Future
for FutIn12<T, E, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12>
where
F1: Future<Item = T, Error = E>, // ...
{
type Item = T;
type Error = E;
fn poll(&mut self) -> Result<Async<Self::Item>, Self::Error> {
use FutIn12::*;
match *self {
Fut1(ref mut f) => f.poll(), // ...
}
}
}
I was thus thinking that it'd be great to have anonymous sum types that automatically derived the traits shared by all their variants, so that I could get rid of this code and just have my -> impl Future<Item = (), Error = ()>
function return the futures in its various branches (with some syntax, that ideally wouldn't expose the total number of variants but let rustc infer it from the returning points, so that adding a branch doesn't require changing all the other return points), and have the anonymous sum type match the -> impl Future
return type.
As I wrote here I think this use case would be better addressed by something modelled after how closures work.
I don’t think it would be wise to make anonymous sum types nominally typed, as you seem to suggest. Structural typing, as with tuples, is far more useful and less surprising to the programmer.
@alexreg What they're saying is that the specific use-case of wanting to return impl Trait
with different types in each branch is better handled by a secret nominal type, similar to how closures are implemented.
Therefore, anonymous sum types are separate (and mostly unrelated) from that use case.
@Pauan Oh, well I agree with that. As long as we consider these things two separate features, fine. Thanks for clarifying.
Oh indeed good point, thanks! Just opened #2414 to track this separate feature, as I wasn't able to find any other open issue|RFC for it :)
I'm planning to get out a pull request for this proposed RFC. Most of you following this thread probably know that a number of proposals like this were rejected for being too complex, so its focus is minimalism and implementation simplicity rather than ergonomics and features. Any words before I get it out? (I've asked this question in multiple other areas to try to collect as much feedback before getting the proposed RFC out, fyi)
https://internals.rust-lang.org/t/pre-rfc-anonymous-variant-types/8707/76
I am not sure where the appropriate place is at this point to suggest solutions to this problem, but one thing that was mentioned was interaction with impl Trait
. Perhaps an anonymous enum could be created of all returned things so long as they implement some trait. For instance (the ... are left to your imagination):
fn foo() -> Result<(), impl Error> {
..
return Err(fmt::Error...);
...
return Err(io::Error...);
...
return Ok(());
}
This would make an implicit anonymous enum/sum type that implements Error. This would greatly help the current situation with Rust error handling.
Edit: I can also write up a pre-rfc with this if it seems workable.
@vadixidav Ideas like that have also been floating around for years under names like enum impl Trait
. For example:
It's generally considered a separate feature proposal, since an enum impl Trait
would be something you cannot match
on, so there would be no need for any special syntax for the concrete types or their values, but it would only apply to function returns. An "anonymous sum type" is usually taken to mean something that can be created and used anywhere, would be something you explicitly match
on, and thus requires adding some special syntax for the concrete types and values.
@alexreg Got it. I will direct my focus to places where that feature is being proposed instead. Thank you for the pointer.
I like this feature, this is like TypeScript union types https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html
Will be interesting see auto generated enum on rust, I already like the TypeScript syntax type1 | type2 | ...
or enum(type1, type2, ...)
fn add_one(mut value: String | i64) -> String | i64 {
match value {
x : String => {
x.push_str("1");
x
}
y : i64 => { y + 1 }
}
}
Any update on this ?
Would this also be useful for coalasing errors in Result chains?
trySomething() //Result<A, E1>
.and__then(trySomethingElse) //Result<B, E1|E2>
.and__then(tryYetAnotherThing) //Result<C, E1|E2|E3>
Hey all, I wrote a post about this topic today: https://blog.yoshuawuyts.com/more-enum-types/. In particular I think it's interesting that if we compare structs and enums, it seems enums often take more work to define. Here's the summary table from the post:
Structs | Enums | Enums Fallback | |
---|---|---|---|
Named | struct Foo(.., ..) |
enum Foo { .., .. } |
- |
Anonymous | (.., ..) |
❌ | either crate |
Type-Erased | impl Trait |
❌ | auto_enums crate |
auto_enums
I am working on a library to more or less do what you want, i think. It looks something like this
#[derive(Debug)]
struct Bar;
#[ano_enum]
fn foo() -> ano!(Bar | u8 | u64) {
Bar
}
#[ano_enum]
fn bar1(foo: ano!(i8 | u8 | Bar)) {
match ano!(foo) {
foo::u8(n) => {println!("{}", n + 1)},
foo::i8(n) => {println!("{}", n)},
foo::Bar(n) => {println!("{:#?}", n)},
}
}
I like this feature, this is like TypeScript union types https://www.typescriptlang.org/docs/handbook/unions-and-intersections.html
Will be interesting see auto generated enum on rust, I already like the TypeScript syntax
type1 | type2 | ...
orenum(type1, type2, ...)
fn add_one(mut value: String | i64) -> String | i64 { match value { x : String => { x.push_str("1"); x } y : i64 => { y + 1 } } }
I really like this syntax since it works much like TypeScript. Rust and TS are my main two languages, and union types is something I greatly miss in Rust. This is probably the #1
feature, in my book, which Rust lacks but needs. I hope this makes it into the language sooner than later.
About the comparison with TypeScript union types:
YES! I'm tired of having to guess traits, reading docs, or relying on an IDE, just to say that a fn
works correctly for many input-arg types. I wish I could do something like:
const fn gcd(mut a: Int, mut b: Int) -> Int {
while b != 0 {
(a, b) = (b, a % b)
}
a.abs()
}
Where Int
is a named union type comprising all fixed-size integers (signed, unsigned, usize
, and isize
)
I suspect most people wouldn't want the enum for that, since they don't want the enum for the return type, but rather they want it to return the type they put in (or maybe the unsigned variant thereof).
Perhaps you're looking for a generic method instead, something like https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=61aa7ed143dbd681b725bc24fbcd7516
use num_traits::*; // 0.2.15
fn gcd<Int: Signed + Copy>(mut a: Int, mut b: Int) -> Int {
while b != Int::zero() {
(a, b) = (b, a % b)
}
a.abs()
}
I suspect most people wouldn't want the enum for that, since they don't want the enum for the return type, but rather they want it to return the type they put in (or maybe the unsigned variant thereof).
True. But what I suggest isn't to return an enum
per-se, but to return the primitive value directly, regardless of the type (as long as it is constrained).
Perhaps you're looking for a generic method instead, something like https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=61aa7ed143dbd681b725bc24fbcd7516
Thank you a lot! But I wish it was possible to specify types in fn
signatures, without any kind of trait-constraints at all, using the types themselves as constraints, like so:
//custom keyword
typedef uint {
u8, u16, u32, u64, u128, usize
}
const fn gcd(mut a: uint, mut b: uint) -> uint {
while b != 0 {
(a, b) = (b, a % b)
}
a
}
This way, we could define custom union types that "contain" (I couldn't think of a better term) arbitrary types, as long as the compiler proves that they are "compatible"
I have heard this concept referred to before as "static trait
"; there was a brief discussion here. Very different concept from an enum, however
I have heard this concept referred to before as "
static trait
"; there was a brief discussion here.
Thank you for the link! I'll read it now
Very different concept from an enum
Definitely
What's the state of this ?
@DanteMarshal There is no open RFC for this feature, so there is no status to report.
To add some points, that might have been mentioned in the last 10 years of discussion, but there are too many places this was discussed, so excuse me, that I do not have the full overview about everything discussed so far:
With the already stable core::any::TypeId::of
and the recently stabilised enum Foo { A = 1, B(B} = 2, … }
, we could simply desugar A | B | C
to enum { A(A) = TypeId::of::<A>(), B(B) = TypeId::of::<B>(), C(C) = TypeId::of::<C>() }
, as soon as TypeId::of
is stabilised as const. A function returning such a type:
fn foo<T, U>(a: T, b: U, c: usize) -> usize | T | U | () | ! {
match c {
0 => panic!("Sometimes gonna give you up, sometimes gonna let you down."),
1 => (),
2 => a,
3 => b,
x => x,
}
}
could now be desugared to, if we allow impl
on fn
and let
:
fn foo<T, U>(a: T, b: U, c: usize) -> foo<T, U> {
match c {
0 => panic!("Sometimes gonna give you up, sometimes gonna let you down."),
1 => (),
2 => a,
3 => b,
x => x,
}
}
impl<T,U> foo<T, U> {
// `T` and `U`, `T` and `usize`, `U` and `usize` or all of them might be equal.
// IMHO, `A | A` should be equal to `A`.
// Merging two variants with the same name, type and discriminant should be safe?
// IMHO, `A | B` should be equal to `B | A`, so by sorting them by discriminant,
// the information about the order is removed.
// If the final definition of this enum only has one variant,
// this would already be optimised out iirc.
type result<T, U> = enum {
// A normal type with a normal name.
usize(usize) = TypeId::of::<usize>(),
// {T} and {U} means: The actual name of this type.
{T}(T) = TypeId::of::<T>(),
{U}(U) = TypeId::of::<U>(),
// This is not a real enum-decleration, but an internal one, so in addition to the unit type,
// tuples and array should be valid variants as well.
()() = TypeId::of::<()>(),
// The bottom type `!` and any other empty `enum` has no values,
// so we can basically just ignore it here.
};
// However, IMHO it is beneficial to annotate functions, that might panic, with `… | !` in the return type.
// E.g. we might want to annotate a function with `#[panic_free]` which ensures,
// that inside this function, no function that might panic is called.
// By explicitly labeling functions with either `#[panic_free]` or `… | !`,
// we can prevent false positives/false negatives.
const MIGHT_PANIC: bool = true;
}
An issue arises with literal values, e.g. fn bar(a: bool) -> u8 | i8 { if a { 1 } else { 2 } }
.
If a literal value could be coerced to more than one variant, the compilation must fail with a suggestion to type the literals (1u8
, 2i8
).
if a { 128 } else { -2 }
on the other hand would be valid, because the range of u8
is [0,255] and of i8
is [-128,127].
Obviously, the compilation must fail if a literal value cannot be coerced to any variant, even in unreachable code paths.
If one does not want A | A
to be equal to A
, one could declare struct A1 { inner: A }
and struct A2 { inner: A }
, even though this would add quite some overhead.
AFAIK, match
already accepts all relevant patterns, but the compiler has to interpret <primitive>(x)
as foo::result::<primitive>(x)
. Somehow, the compiler already does so for e.g. Option
and Result
like in:
match try_foo() {
Some(x) => …,
None => …,
}
instead of:
match try_foo() {
Option::Some(x) => …,
Option::None => …,
}
There is already an ongoing discussion to infer the base-enum-type (421, 2830, 3167, …), this would become quite useful here. Especially with non-alphanumeric names like unit, tuples, arrays, references and pointers:
match foo("Hello", [ 23u8, 42 ], random_integer) {
&'a str(msg) => println!("Got string {:?}", msg),
[ p, q ] => println!("Got array [ {:?}, {:?} ]", p, q),
usize(x) => println!("Got string {:?}", x),
}
The last pattern usize(x)
is somewhat unhandy, perhaps could allow ranges like this:
fn select(n: usize, a: u8, b: i8, c: bool, d: &'static [ char ], e: Option<bool>, f: Option<u8>) -> u8 | i8 | bool | &'static [ char ] | Option<bool> | Option<u8> | () {
match n {
0 => a,
1 => b,
2 => c,
3 => d,
4 => e,
5 => f,
_ => (),
}
}
match select(random_integer, 200, -23, true, &[ 'H', 'e', 'l', 'l', 'o' ]) {
x @ -128..=255 => println!("Got integer {:?}", x),
y @ (true | false) => println!("Got bool {:?}", y),
[ head, tail @ .. ] => println!("Got slice starting with {:?}", head),
None => println!("Got nothing"),
Some(inner) => println!("Got something {:?}", inner),
other => println!("What does this match? {:?}", other),
}
However, what type has x
?
This line has to be duplicated with the cases i8(x @ -128..=127)
and and u8(x @ 0..=255)
.
And what about inner
?
Should the variants of named sum types be merged into the anonymous one?
With just one Option
-type, this seems to be reasonable, because None
and Some
are unambiguous, but not with multiple.
Some(_)
and Some(inner @ (true | false))
are unambiguous as well: The former just does not care and the latter enforces bool
.
There might also be two enums with the same variant-names, so merging named sum types with anonymous ones is dangerous.
Merging anonymous enums on the other hand is desirable, e.g.:
fn foo(…) -> bool | usize { … }
fn bar(…) -> usize | () { … }
// Type of a is `foo::result | bar::result ` which is equal to `bool | usize | ()`
// which would have the internal type name `enum a::result { … }`
let a = if random_bool { foo() } else { bar() };
But what about nested ones like:
fn foo(…) -> Option<bool> { … }
fn bar(…) -> Option<usize> { … }
// The type of a is obviously `Option<bool> | Option<usize>`,
// but is this equivalent to `Option<bool|usize>`?
let a = if random_bool { foo() } else { bar() };
Merging nested types could solve the problem mentioned above: The type of inner
is just bool | u8
, and one has to match them, however, other enums with the same variant names are still an issue.
I think this is similar to the x @ -128..=255
-case: Just store it as different variants Option::<bool>(Option<bool>)
and Option::<usize>(Option<usize>)
and duplicate match-arms for all possible variants (e.g. Result<P|Q,T|U>
to Ok::<P>(x)
, Ok::<Q>(x)
, Err::<T>(x)
, Err::<U>(x)
) unless the resulting arms are actual duplicates (e.g. because the inner value was not used due to _
, two None
s, …).
However, that might result in quite some hidden code that feels like bloat.
References introduce another issue: lifetimes. The simplest solution would be to just allow 'static
for now, so &'a str | &'b str
is only allowed if 'a == 'b == 'static
. Maybe this is not an issue at all, if we allow only reference per type:
fn<'a, 'b> msg(x: &'a str, y: &'b str, z: usize) -> &'b str | usize where 'a: 'b {
match z {
// "Zero" is of type `&'static str`, and because `'static: '_` is always true,
// we do not have to add the bounds `'static: 'a` and `'static: 'b`.
0 => "Zero",
1 => x,
2 => y,
z => z,
}
}
Unlike named enums, we might want to allow the compiler to implement a function for each configuration of argument-types, if called with values, that are not of any anonymous sum type, e.g.:
fn add(a: f32 | isize, b: f32 | isize) -> f32 | isize {
match (a, b) {
(isize(a), isize(b)) => a + b,
(a, b) => f32::from(a) + f32::from(b),
}
}
clearly returns isize
if and only if both arguments are of type isize
, so by duplicating this function, we would get add::<isize,isize>(a: isize, b: isize) -> isize
, add::<isize,f32>(a: isize, b: f32) -> f32
, … resulting in 7 functions.
Optimising all of this could be quite an issue, but should be possible and should not impact the performance of the product.
Maybe some hints/assertions like where a: isize && b: isize <-> $: isize
might be useful, but this is something for an entirely different RFC.
I also like the previously mentioned x : T => …
, this might actually be implemented less complicated, even though I am not sure if T
should also allow variants, if it is a enum
.
I like to apologise for this long wall of text by adding this additional text.
My two cents; I think the motivation for anonymous enums is broadly:
For these use cases, I think it's important that anonymous enums are easy to refactor. In the case of anonymous structs, the transformation is pretty painless. You just introduce a new tuple struct, and prefix each tuple with the struct's name:
fn split(text: &'static str, at: usize) -> (&'static str, &'static str) {
(&text[..at], &text[at..])
}
assert_eq!(split("testing", 4), ("test", "ing"));
+ #[derive(Debug, PartialEq)]
+ struct Split(&'static str, &'static str);
! fn split(text: &'static str, at: usize) -> Split {
! Split(&text[..at], &text[at..])
}
! assert_eq!(split("testing", 4), Split("test", "ing"));
On the other hand, I don't think structurally anonymous enums would be as helpful for prototyping since there isn't an equivalent in explicit enums. The transformation would likely involve tedious renaming and rearranging for every pattern:
fn add_one(value: String | i64) -> String | i64 {
match value {
mut x: String => {
x.push_str("1");
x
}
y: i64 => {
y + 1
}
}
}
fn something(value: String | i64) {
match value {
x: String => println!("String: {x}"),
y: i64 => println!("i64: {y}"),
}
}
+ #[derive(Debug, PartialEq)]
+ enum AddOne {
+ String(String),
+ i64(i64),
+ }
! fn add_one(value: AddOne) -> AddOne {
match value {
! AddOne::String(mut x) => {
x.push_str("1");
! AddOne::String(x)
}
! AddOne::i64(y) => {
! AddOne::i64(y + 1)
}
}
}
! fn something(value: AddOne) {
match value {
! AddOne::String(x) => println!("String: {x}"),
! AddOne::i64(y) => println!("i64: {y}"),
}
}
Not to mention that the names would likely be changed from String
and i64
(either to add camel case, to be more descriptive, or to name syntactical types like: (T,)
, [T; N]
, fn(T) -> U
).
I also think that anonymous enums should be able to represent more common stateful enum types similar to std::option::Option
, std::cmp::Ordering
, and std::ops::Bound
. Without this, I think most would end up doing a similarly awkward transformation from an anonymous struct instead:
use std::cmp::Ordering;
fn max(a: i64, b: i64) -> (i64, Ordering) {
match a.cmp(&b) {
Ordering::Less => (b, Ordering::Less),
Ordering::Equal => (a, Ordering::Equal),
Ordering::Greater => (a, Ordering::Greater),
}
}
assert_eq!(max(4, 7), (7, Ordering::Less));
use std::cmp::Ordering;
+ enum SomeOrdering {
+ Less(i64),
+ Equal(i64),
+ Greater(i64),
+ }
! fn max(a: i64, b: i64) -> SomeOrdering {
match a.cmp(&b) {
! Ordering::Less => SomeOrdering::Less(b),
! Ordering::Equal => SomeOrdering::Equal(a),
! Ordering::Greater => SomeOrdering::Greater(a),
}
}
! assert!(matches!(max(4, 7), SomeOrdering::Less(7)));
I would prefer a more general syntax where the variants are explicitly named and referenced using something like return type notation. If the anonymous enum is converted into an explicit one in the future, any code referencing its variants could still function with zero refactoring required.
use std::cmp::Ordering;
fn max(a: i64, b: i64) -> enum {
Less(i64),
Equal(i64),
Greater(i64),
} {
match a.cmp(&b) {
Ordering::Less => max()::Less(b),
Ordering::Equal => max()::Equal(a),
Ordering::Greater => max()::Greater(a),
}
}
assert!(matches!(max(4, 7), max()::Less(7)));
use std::cmp::Ordering;
+ enum SomeOrdering {
+ Less(i64),
+ Equal(i64),
+ Greater(i64),
+ }
! fn max(a: i64, b: i64) -> SomeOrdering {
match a.cmp(&b) {
Ordering::Less => max()::Less(b),
Ordering::Equal => max()::Equal(a),
Ordering::Greater => max()::Greater(a),
}
}
assert!(matches!(max(4, 7), max()::Less(7)));
Alternatively, maybe a general syntax for explicit enums with anonymous variants could be defined, although it seems niche and awkward to me.
Your friend, Yokin
I found this issue while trying to do a struct where I wanted to contain either syn::Token![#]
or syn::Token![$]
which would have matched a union type, though I actually thought to make the variants + their order relevant (same as it is for tuples) to make it fully compatible with enums where two variants can contain the same value.
The syntax I imagined was
let mut value: <u8, u8, u8> = ::0(1);
value = ::1(2);
value = ::2(3);
match a {
::0(a) => println!("first: {a}"),
::1(b) | ::2(b) => println!("second or third: {b}"),
}
Using number indexes, inspired by how tuples work.
i dont like the order being relevant... addition is commutative, so different permutations of the same sum type should be equivalent (like reordering fields in a struct). i dont want to have to convert the output of a function to pass it to another if they disagree on the order
i think what most of us want is a sum type like <a, b, c>
that behaves like an enum for which the variants are named after the types, not their position in the "union". writing <a, a, a>
wouldn't even be possible, or would just be equivalent to <a>
for a tuple, (a, b, c)
is not the same as (c, b, a)
because a tuple is a struct with each field named after the position of each type/value. it allows for (a, a, a)
though your suggestion is interesting because it's a nice parallel with how tuples work, i don't think this is what we're looking for
I like having order matter since it fixes a hard problem:
for fn f<T, U>() -> <T, U>
what happens when T = U
?
It's also analogous to tuples in that both have their fields/variants be position-based numbers instead of being names.
There's also prior art:
C++'s std::variant
behaves like this, where you can use std::get<N>(my_variant)
to get the Nth alternative and throw an error if my_variant
's current alternative isn't the Nth alternative. (you can also access by type but only if the type is unique)
I like having order matter since it fixes a hard problem: for
fn f<T, U>() -> <T, U>
what happens whenT = U
?writing
<a, a, a>
wouldn't even be possible, or would just be equivalent to<a>
Doesn't that address the issue?
I like having order matter since it fixes a hard problem: for
fn f<T, U>() -> <T, U>
what happens whenT = U
?writing
<a, a, a>
wouldn't even be possible, or would just be equivalent to<a>
Doesn't that address the issue?
no, because you should be able to write <T, U>
for generic T
and U
(since otherwise it is severely limited), and if you try to define <T, U>
to somehow behave differently when you use that generic code with types where T = U
then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a &'a u8
to &'static u8
so you can read after the memory has been freed, which is UB.
if you try to define
<T, U>
to somehow behave differently when you use that generic code with types whereT = U
then you run into unsoundness caused by the same reasons as why general specialization is unsound: it doesn't properly account for lifetimes, which allows you to write code that makes the compiler e.g. transmute a&'a u8
to&'static u8
so you can read after the memory has been freed, which is UB.
this is because the compiler erases all lifetime information before it generates the de-genericified code where all generics are substituted by actual types, which means that it can't tell the difference between &'a u8
and &'static u8
at that point since they both end up as &'erased u8
both are valid use cases but they really are different
you're describing C++'s std::variant
(which is nice), but some of us want TypeScript's unions (but better)
i think it wouldn't be too difficult to implement std::variant
when Rust eventually gets variadic generics. it would probably work like C++'s (similar to Either<L, R>
but with an arbitrary number of generics)
but TypeScript-like unions, for which the order isn't relevant, would probably require compiler-level support (so that <a, b, c>
is equivalent to <b, a, c>
and all other permutations). i don't think it's possible in Rust today to write a generic type (with >1 parameter) for which changing the order of the type parameters doesn't change the identity of the type
i wanna be able to write a function that returns some Result<T, <E1, E2, ..., En>>
and not have to worry about the order in which the errors are laid out in my error type. it's just an unordered set (bag) of errors. and ?
just adds to that bag, wherever it feels like. and i can have some handle_error(e: <E1, E2, ..., En>)
function. or maybe its handle_error(e: <E4, E2, E6, E8, ..., En>)
etc... it just match
es over the type of error, not the position in the sum type
For the record, this particular issue is explicitly about anonymous sum types, not union types. I.e. with position/tag/discriminant-based matching (like Rust enum
s), not type-based matching (like TS). I don't know if there's already an issue for the latter, but it might be worth opening one if people are interested in it.
@glaebhoerl I don't know if Wikipedia is wrong here, but:
In type theory, a union has a sum type; this corresponds to disjoint union in mathematics.
Maybe that's why the confusion.
I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886
So I think we're actually asking for the same thing, i.e. tagged union, a.k.a. sum type.
I don't think anyone is asking for an untagged union: https://news.ycombinator.com/item?id=32018886
The thread you quoted seems to be about Haskell-like languages, and I guess the untagged union in that context differs from what you imagine (maybe the union
keyword of C and, well, Rust, which, unlike TypeScript, doesn't really have runtime type tags at all and is thus inherently unsafe?).
A union type is like a set union. It differs from sum type (corresponds to Rust's enum E { A(T), B(U) }
) in that the union type of same types $T$ and $T$ equals $T$ (similar to $S \cup S = S$) while the sum type of same types $T$ and $T$ doesn't equal, nor is a supertype of $T$.
Some people in this issue have proposed the union type in this sense in addition to the sum type.
Yeah, that Haskell-related discussion doesn't really make sense here. We already have tagged unions in Rust, they're called enums. Each tag is the variant's name. Our goal here is having untagged (or anonymous) ones so you can define T | U | V
(like String | u64 | bool
) and match based on each type without needing to previously declare (and in some cases import into the current scope) that tagged wrapper type (the enum type).
@Keavon you seem to be confusing tagged/untagged with named/anonymous, they are very different things. Rust already has untagged and tagged named unions, what it's missing are anonymous tagged unions.
"tag" refers to the enum discriminant. Untagged unions do not have a discriminant, and so are unsafe to access. See https://en.wikipedia.org/wiki/Tagged_union
The names of enum variants are not called tags - they have associated tags, for example:
enum Foo {
A, // Might have tag 0
B, // Might have tag 1
C // Might have tag 2
}
Tags also exist for anonymous enums, since the compiler still needs to differentiate which variant is selected, for example:
type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */
I see, thanks for pointing out my terminology error. If I'm reading what you explained correctly, I think you're responding to this part of my second sentence above:
Our goal here is having untagged (or anonymous) ones...
(is that right?)
Rephrasing what I wrote above, then, I think that I was describing a goal of having the compiler's type system figure out the tags behind the scenes, allowing you to write code with anonymous variant names as well as anonymous enum types. So behind the scenes, it would be tagged (using your illustrations of // Might have tag 0
, etc.) but those variant names shouldn't be given by the user, and neither should the entire type be given by the user either unless the user decides to typedef it with your final code block example:
type Foo = A /* might get tag 0 */ | B /* might get tag 1 */ | C /* might get tag 2 */
The result should be an equivalent to TypeScript's approach, however with the ability for the compiler to discriminate between the types at compile time so the code doesn't have to match based on something kind a kind: string
field required in TS for its ability to work at runtime once it becomes JS. Is that roughly accurate now?
The key distinction lies in this scenario:
type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>; // !!
fn bar_consumer(bar: Bar) {
match bar {
// ??
}
}
In this case, Foo<A, B>
flattens to a single-member union containing just i32
(or alternatively the compiler could treat it as indistinguishable to i32
).
type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>;
fn bar_consumer(bar: Bar) {
match bar {
i: i32 => ...,
// No other options are possible.
}
}
With a sum type, each choice in the anonymous enum must be assigned a locally unique name. Here, I chose to use the type parameter itself as the name. There are other alternatives of course, like giving them integer names analogous to tuples (.0
, .1
, .2
, etc) but I imagine the ergonomics would be poor.
type Foo<A, B> = A | B;
type Bar = Foo<i32, i32>; // Remains a two-member union
fn bar_consumer(bar: Bar) {
match bar {
A(i: i32) => ...,
B(j: i32) => ...,
}
}
Issue by glaebhoerl Saturday Aug 03, 2013 at 23:58 GMT
For earlier discussion, see https://github.com/rust-lang/rust/issues/8277
This issue was labelled with: B-RFC in the Rust repository
Rust has an anonymous form of product types (structs), namely tuples, but not sum types (enums). One reason is that it's not obvious what syntax they could use, especially their variants. The first variant of an anonymous sum type with three variants needs to be syntactically distinct not just from the second and third variant of the same type, but also from the first variant of all other anonymous sum types with different numbers of variants.
Here's an idea I think is decent:
A type would look like this:
(~str|int|int)
. In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and).A value would have the same shape (also like tuples), with a value of appropriate type in one of the "slots" and nothing in the rest:
(Nothing is a bikeshed, other possible colors for it include whitespace,
.
, and-
._
means something is there we're just not giving it a name, so it's not suitable for "nothing is there".!
has nothing-connotations from the negation operator and the return type of functions that don't.)I'm not sure whether this conflicts syntax-wise with closures and/or negation.
Another necessary condition for this should be demand for it. This ticket is to keep a record of the idea, in case someone else has demand but not syntax. (If the Bikesheds section of the wiki is a better place, I'm happy to move it.)
SEE ALSO
402
514
1154