RFC: Consider turning `as` into a user-implementable Cast trait

erickt commented 11 years ago

@cmr, @huonw and I were talking in irc about how to name functions that allow you to convert from one type to another. For example, consider these str methods:

fn to_bytes(&str) -> ~[u8];
fn as_bytes<'a>(&'a str) -> &'a [u8]

These implement a common pattern, where in the to case, we are copying the string into the new vec. In the as case, we are making a no-copy cast from a string to a vector.

It'd be nice if we could standardize this pattern, and one way we could do this is to turn the as operator into a trait a user can implement, like the Neg trait. We can do this if we follow the pattern @nikomatsakis laid out on his blog. Here's a working example implementation:

use std::io;

trait Cast<LHS> {
    fn cast(LHS) -> Self;
}

////

trait IntCast {
    fn cast_int(&self) -> int;
}

impl<LHS: IntCast> Cast<LHS> for int {
    fn cast(x: LHS) -> int { x.cast_int() }
}

impl IntCast for i8 {
    fn cast_int(&self) -> int { *self as int }
}

impl IntCast for i16 {
    fn cast_int(&self) -> int { *self as int }
}

////

trait StrCast {
    fn cast_str(&self) -> ~str;
}

impl<LHS: StrCast> Cast<LHS> for ~str {
    fn cast(x: LHS) -> ~str { x.cast_str() }
}

impl<'self> StrCast for &'self [u8] {
    fn cast_str(&self) -> ~str { self.to_str() }
}

////

trait StrSliceCast<'self> {
    fn cast_str_slice(&self) -> &'self str;
}

impl<'self, LHS: StrSliceCast<'self>> Cast<LHS> for &'self str {
    fn cast(x: LHS) -> &'self str { x.cast_str_slice() }
}

impl<'self> StrSliceCast<'self> for &'self [u8] {
    fn cast_str_slice(&self) -> &'self str {
        unsafe {
            assert!(std::str::is_utf8(*self));
            let (ptr, len): (*u8, uint) = std::cast::transmute(*self);
            std::cast::transmute((ptr, len + 1))
        }
    }
}

fn main() {
    io::println(fmt!("%?", Cast::cast::<i8, int>(5_i8)));
    io::println(fmt!("%?", Cast::cast::<i16, int>(5_i16)));

    io::println(fmt!("%?", Cast::cast::<&[u8], ~str>(bytes!("hello world"))));
    io::println(fmt!("%?", Cast::cast::<&[u8], &str>(bytes!("hello world"))));
}

While it's a bit wordy, it does work.

Unfortunately there's a third conversion option that I couldn't figure out how to fit into this paradigm, where we consume the input to produce the output:

fn to_option<T, U>(r: Result<T, U>) -> Option<T> {
    match r {
        Ok(x) => Some(x),
        Err(_) => None,
    }
}

Other places this would be useful is when we can cheaply transform ~str into ~[u8], or consume all the elements from a HashMap and move them into a ~[T]. Perhaps a separate function would be best to capture this case. Or we do the reverse, and say Cast consumes the input, but since going from one reference type to another is cheap, we optimize that specific case. I'm not sure.

erickt commented 11 years ago

You can find a version that's using moves here: https://gist.github.com/erickt/5762933, but unfortunately I'm getting this llvm bug

Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/etryzelaar/Projects/rust/rust/src/llvm/lib/IR/Instructions.cpp, line 2290.

I expect it's related to #4759.

Kimundi commented 11 years ago

Note that the reason for a build-in as is mainly casts in constexprs. Also, it's used for casting to a Trait object. Replacing as with methods, or turning as into sugar for them is a nice idea, but there needs to be solution for sth like this:

static FOO: char = 159u8 as char;
static BAR: u8 = '?' as u8 + 45;

However, seeing how

Explicit casting to Traitobjects might get removed and replaced with implicit coercion,
as is only used for arithmetic casts otherwise,
Trait-based casts are more versatile and user-extensible.

It might be sensible to replace the constexpr-use case with a static_cast!() syntax extension:

static FOO: char = static_cast!(159u8 as char);
static BAR: u8 = static_cast!('?' as u8) + 45;

which would be hard-coded to expand to a literal of the right type. In fact, such a extension would nicely subsume the current bytes!()-se.

emberian commented 11 years ago

Note that I've been thinking about CTFE and how it can be used with static items. I think @bblum's effect proposal could easily express this: only functions/methods without certain effects are valid for consideration for CTFE. This would fit in nicely with #[static_assert] (for example, I want to assert that two &'static[&'static str] are the same size , but currently cannot because .len() (a method call) is not currently allowable). Combine that with an as trait and perhaps it could be allow in a static context? Although I'm not sure how well that'd work out in practice: sometimes you want the conversion to have effects, or need effects to actually do the cast. So maybe static_cast!() isn't a bad idea (or just special-casing the numeric casts).

I didn't consider how this would interact with trait objects and I don't know enough about them to evaluate its effect on them.

erickt commented 11 years ago

@Kimundi: curse you, constant expressions! Bane of my existence. We could tone this down, and provide these traits to standardize how we transform from one type to another, and keep as around for the constant expressions.

Another option is to copy the bytes! syntax extension, and implement enough compile-time-typecasts to cover cases like you mentioned:

static FOO: char = char!(159u8);
static BAR: u8 = u8!('?') + 45;

Either way, it's not that common to use these static casts, at least in rust proper. So I wouldn't be too upset if it was a little uglier to use if it allowed us to take better advantage of the as operator, or even reclaim it for the user. Here's all the uses I could find:

libstd/managed.rs:    pub static RC_EXCHANGE_UNIQUE : uint = (-1) as uint;
libstd/managed.rs:    pub static RC_MANAGED_UNIQUE : uint = (-2) as uint;
libstd/num/int_macros.rs:pub static min_value: $T = (-1 as $T) << (bits - 1);
libstd/num/int_macros.rs:pub static max_value: $T = min_value - 1 as $T;
libstd/num/strconv.rs:static inf_buf:          [u8, ..3] = ['i' as u8, 'n' as u8, 'f' as u8];
libstd/num/strconv.rs:static positive_inf_buf: [u8, ..4] = ['+' as u8, 'i' as u8, 'n' as u8,
libstd/num/strconv.rs:static negative_inf_buf: [u8, ..4] = ['-' as u8, 'i' as u8, 'n' as u8,
libstd/num/strconv.rs:static nan_buf:          [u8, ..3] = ['N' as u8, 'a' as u8, 'N' as u8];
libstd/num/strconv.rs:priv static DIGIT_P_RADIX: uint = ('p' as uint) - ('a' as uint) + 11u;
libstd/num/strconv.rs:priv static DIGIT_I_RADIX: uint = ('i' as uint) - ('a' as uint) + 11u;
libstd/num/strconv.rs:priv static DIGIT_E_RADIX: uint = ('e' as uint) - ('a' as uint) + 11u;
libstd/num/uint_macros.rs:pub static min_value: $T = 0 as $T;
libstd/num/uint_macros.rs:pub static max_value: $T = 0 as $T - 1 as $T;
libstd/rand.rs:static scale : f64 = (u32::max_value as f64) + 1.0f64;
libstd/rand.rs:        static midpoint: uint = RAND_SIZE as uint / 2;
libextra/arena.rs:static tydesc_drop_glue_index: size_t = 3 as size_t;
libextra/num/bigint.rs:    priv static hi_mask: uint = (-1 as uint) << bits;
libextra/num/bigint.rs:    priv static lo_mask: uint = (-1 as uint) >> bits;
libsyntax/abi.rs:static IntelBits: u32 = (1 << (X86 as uint)) | (1 << (X86_64 as uint));
libsyntax/abi.rs:static ArmBits: u32 = (1 << (Arm as uint));
librustc/lib/llvm.rs:pub static True: Bool = 1 as Bool;
librustc/lib/llvm.rs:pub static False: Bool = 0 as Bool;
librustpkg/path_util.rs:pub static u_rwx: i32 = (S_IRUSR | S_IWUSR | S_IXUSR) as i32;
test/bench/shootout-fasta-redux.rs:static LOOKUP_SCALE: f32 = (LOOKUP_SIZE - 1) as f32;
test/bench/shootout-fasta-redux.rs:static NULL_AMINO_ACID: AminoAcid = AminoAcid { c: ' ' as u8, p: 0.0 };
test/bench/shootout-k-nucleotide.rs:static TABLE: [u8, ..4] = [ 'A' as u8, 'C' as u8, 'G' as u8, 'T' as u8 ];
test/compile-fail/const-cast-different-types.rs:static b: *u8 = a as *u8; //~ ERROR non-scalar cast
test/compile-fail/const-cast-different-types.rs:static c: *u8 = &a as *u8; //~ ERROR mismatched types
test/compile-fail/const-cast-wrong-type.rs:static a: [u8, ..3] = ['h' as u8, 'i' as u8, 0 as u8];
test/compile-fail/const-cast-wrong-type.rs:static b: *i8 = &a as *i8; //~ ERROR mismatched types
test/run-pass/const-autoderef.rs:static A: [u8, ..1] = ['h' as u8];
test/run-pass/const-cast-ptr-int.rs:static a: *u8 = 0 as *u8;
test/run-pass/const-cast.rs:static y: *libc::c_void = x as *libc::c_void;
test/run-pass/const-cast.rs:static b: *int = a as *int;
test/run-pass/const-enum-cast.rs:    static c1: int = A2 as int;
test/run-pass/const-enum-cast.rs:    static c2: int = B2 as int;
test/run-pass/const-enum-cast.rs:    static c3: float = A2 as float;
test/run-pass/const-enum-cast.rs:    static c4: float = B2 as float;
test/run-pass/const-str-ptr.rs:static a: [u8, ..3] = ['h' as u8, 'i' as u8, 0 as u8];
test/run-pass/const-str-ptr.rs:static b: *u8 = c as *u8;

emberian commented 11 years ago

@erickt I think with the effect system we could even have the trait be CTFE for the current static uses. It could be hardcoded in the interim, just to keep things pretty. @bblum?

emberian commented 11 years ago

Heh, I already left a comment to that end, nevermind :blush:

bblum commented 11 years ago

A constexpr effect would be tractable, I think. Might be an issue with asserts -- if you wanted to write asserts in your function but have it be constexpr anyway, either you would have to pick one, or the compiler would be smart enough to turn the assert into a compile fail (which seems both possible and pretty leet, but difficult). The latter strikes me as a far-future feature.

Also keep in mind effects won't be in 1.0, so it would have to be backwards-compatible.

emberian commented 11 years ago

@erickt I think hardcoding in for the existing cases would be fine.

emberian commented 11 years ago

Nominating for well-defined: this is a language feature.

thestinger commented 11 years ago

I'm against doing this because I don't think one type of conversion should be elevated above the others at a language level. For example, you can downcast to a smaller integer type by truncating, or you can clamp to the max value.

emberian commented 11 years ago

That's a good argument. Having multiple types of conversions makes an overloadable as less straightforward and useful.

thestinger commented 11 years ago

I would be happiest if we lived in a world where Rust had CTFE, and we didn't need as in the language at all. A special case like that feels like a language wart to me.

catamorphism commented 11 years ago

Declining for milestone. We decided this is part of the larger story about constant evaluation, and there are already other bugs open on that.

nikomatsakis commented 11 years ago

My two cents: The reason we have as at all is to distinguish casts that can be done in constants. If we were going to use a trait, let's just make it a normal trait (or multiple, as strcat suggests). This would also solve the problem of the precedence of as. Therefore this really comes down to deciding on our constant strategy.

emberian commented 11 years ago

@catamorphism which bugs?

pnkfelix commented 11 years ago

@cmr I think #5551 is the relevant metabug here.

rust-lang / rust

RFC: Consider turning `as` into a user-implementable Cast trait #7080