Identity comparison for boxed types is weird

ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language

http://www.ponylang.io

BSD 2-Clause "Simplified" License

5.71k stars 415 forks source link

Identity comparison for boxed types is weird #1567

Closed Praetonus closed 7 years ago

Praetonus commented 7 years ago

actor Main
  new create(env: Env) =>
    let a = U8(2)
    let b = U8(2)
    env.out.print((a is b).string())
    foo(env, a, b)

  fun foo(env: Env, a: Any, b: Any) =>
    env.out.print((a is b).string())

This code prints true then false, but I'd expect it to print true then true. This is because

the numbers are unboxed in create, is compares their values, which compare equal
the numbers are boxed in foo (i.e. objects with type descriptors are allocated to handle them through the Any interface), is compares the address of the boxes, which don't compare equal since a different box is allocated for each number

The digestof operator behaves in the same way. This means that a boxed value in a SetIs or a similar collection can't be retrieved.

I think we should modify the is and digestof operators to work on values when they handle a boxed values. I'm going to categorise the issue as a bug since IMO this is a principle of least surprise bug.

SeanTAllen commented 7 years ago

Agreed @Praetonus.

I find that very surprising.

sylvanc commented 7 years ago

Interesting, I find it to be expected behaviour. Once a machine-word is boxed, it has an identity based on its boxing.

On the other hand, if is and digestof can look inside boxed values without paying a performance penalty, then I'm all for making it work that way. Right now, they don't need to do dynamic type checking; in the scenario above, each call to foo would have to dynamically check for a boxed type.

One possible solution would be to have an internal method call __get_identity (or some other name), where is the type was known, the compiler could inline it. However, this will still lead to unnecessary virtual dispatch on any interface, trait, or union type, even when the underlying type couldn't possibly be a machine-word.

On the other hand, the __get_identity approach would work quite well for the (as yet unimplemented) new tuple model, where tuples have type descriptors.

jemc commented 7 years ago

@sylvanc I agree the behaviour makes sense in the context of thinking about boxed vs unboxed values, but as far as I thought, the boxing of values was an implementation detail, and not a user-facing language feature.

That is, at a language level, I would find this surprising, since there doesn't seem to be a language-level awareness of what boxing a value means, and when it happens.

Praetonus commented 7 years ago

@sylvanc My reasoning was the same as @jemc's. As long as boxed values are an implementation detail, I don't think they should affect high-level language features.

My thought for fixing that was to check at compile time if the interface we're ising is a supertype of a boxed type. If it isn't, we'd just check the address at runtime as we're doing currently. If it is, we'd check the type at runtime (as if by gen_isnominal, but I think we could optimise it to avoid checking for every possible boxed type) and unbox and check the value if appropriate. That way, the overhead would only be present on supertypes of boxed types.

jemc commented 7 years ago

On the sync call, @sylvanc suggests that we add a type descriptor to tuples by adding one more element (invisible at the language level), which is the type descriptor. This would avoid the need to ever box tuples, and also clear up a lot of idiosyncrasies in the runtime code.

Praetonus commented 7 years ago

I started working on this and I think having a type descriptor on unboxed tuples won't even be necessary.