ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language
http://www.ponylang.io
BSD 2-Clause "Simplified" License
5.65k stars 411 forks source link

abs() doesn't do what we think it does #143

Closed andymcn closed 8 years ago

andymcn commented 9 years ago

This started out as being an edge case bug in I8.string(). However it turns out that bug is because abs() is crazy. Not just our implementation, but C's (and other languages too).

I'm going to talk about I8 here, but this issue also applies to I16, I32, I64 and I128. I'm using I8 for examples because it makes the numbers easier to think about. Obviously C doesn't actually have an 8 bit type as such, but its behaviour with signed 32 and 64 bit numbers is exactly analogous.

By convention signed 8 bit numbers occupy the range [-128, 127]. The problem is -128, represented by 0b10000000.

The actual absolute value of -128 is obviously 128. However that's not a legal I8. We just calculate -(-128) without treating it as a special case, which is also what C does. The algorithm hardware uses to negate numbers (2's complement) is invert all the bits and add 1. Unfortunately if you put -128 into this (for 8 bit numbers) then you get back -128.

So -(-128) is -128. So (I8(-128)).abs() is -128. Yes, abs() can return a negative number.

This leads (I8(-128)).string() to return "-18446744073709551488" rather than the expected "-128".

There are various ways to fix this. We could handle this case specially in string(), but that still leaves abs() being rather crap.

Since abs() really is a partial function in the true sense we could make it throw, but that'll probably get very annoying.

I think the best answer is to take advantage of our sensibly strict type system and make I8.abs() return a U8 since U8 can represent all the legal values of I8.abs(). (Similarly I16.abs() returns a U16, etc.)

This would require adding an additional type parameter to Real (which defines abs()) specifying the return type we expect from abs(). Integer and SignedInteger would also have this extra parameter. UnsignedInteger would just use its existing type parameter twice.

This would change the definition of I8 from the current:

primitive I8 is SignedInteger[I8]

to

primitive I8 is SignedInteger[I8, U8]

Other than a small change to the actual implementation of I8.abs() that's about all that has to change. The implementation of SignedInteger.string() should work exactly as is.

Kambis commented 9 years ago

I agree that the sting representation needs fixing but when working with types like I8, abs is exactly doing what it should do, just as you expect 127+1 to be -128, abs(-128) is -128; nothing wrong there.

chrisosaurus commented 8 years ago

I agree with the proposal, abs should be defined as the mathematical idea of "absolute value" [1] which is strictly non-negative.

If a function taking an I8 returns a number that is strictly positive it makes sense for this to be reflected in the type system, thus U8 is perfect.

[1] https://en.wikipedia.org/wiki/Absolute_value

chrisosaurus commented 8 years ago

I am new to Pony, so forgive me if I am wrong.

But I also think that we may need to make changes to the Real trait in https://en.wikipedia.org/wiki/Absolute_value as it defines fun abs(): A which is now no longer true as A.abs now will return B (where A is the signed type and B is the unsigned type).

andymcn commented 8 years ago

@mkfifo, thanks for the input. We actually decided yesterday that we should make abs on an I8 return a U8 (and the same for the other sizes), but I hadn't updated this issue yet.

This is the only way to make everything work sensibly and to match with the mathmetical definition of absolute value.

Rather than adding an additional type parameter to Real we're going to move abs to _SignedInteger and _UnsignedInteger and add a second type parameter to those as this should be a more flexible arrangement in the future.

chrisosaurus commented 8 years ago

@andymcn have you started work on this issue? I am interested in trying to help out with pony so I was trying to work out how I would implement this change, I haven't been able to test my changes to the traits (still waiting for my system upgrade to finish so I have llvm-3.6)

One part I didn't like was the need to hardcode the ranges for the types in the abs function, for example:

primitive I128 is _SignedInteger[I128, U128]
fun abs(): U128 => if this >= 0 then this elseif this == -340282366920938463463374607431768211456 then 340282366920938463463374607431768211456 else -this end

is very ugly, how were you planning on modelling this instead?

andymcn commented 8 years ago

I haven't started on this yet, so you're welcome to do it if you want to. We also want to add abs() to _UnsignedInteger for symmetry. This doesn't need an extra type parameter and the existing trivial implementations in U8 etc should be fine.

It's not necessary to hardcode the minimum negative values. Pony doesn't allow implicit type conversions, so you have to convert the I128 to a U128 explicitly with a .u128() call. Conveniently this works out so we don't need to worry about the minimum negative special case.

For the following explanation I'm going to use 8 bit numbers, because it makes it easier to write, but exactly the same argument applies to all the sizes.

Converting between signed and unsigned integers of the same size (in either direction) just reinterprets the bits, ie it's purely a change in the type and a nop at runtime.

If we start with a positive value, eg 1, then we don't change it. When we convert to a U8 will still have 1, which is what we want.

If we start with a negative number, eg -1, then -this gives us 1. When we convert to a U8 will still have 1, which is what we want.

If we start with -128 (which is 1000000 in binary) then -this still gives us -128, because +128 isn't representable as an I8 and we get an overflow. When we convert to a U8 the binary value is still 10000000, which is +128 when considered as an unsigned, which is what we want. So it just works.

This means the abs() function can simply be something like:

primitive I128 is _SignedInteger[I128, U128]
  fun abs(): U128 => if this < 0 then (-this).u128() else this.u128() end
chrisosaurus commented 8 years ago

@andymcn I would love to grab this if that is okay with you.

I previously added abs to both SignedInteger and UnsignedInteger, I added a second parameter to SignedInteger only (constraining it to be an UnsignedInteger, which I think is correct).

Awesome, I was hardcoding the minimum values as I was unsure of how to do a conversion in the negative edge case, I will update the code to use your example which is far nicer.

Thanks for that explanation, just have to get pony building so I can test.

andymcn commented 8 years ago

@mkfifo, sounds good, go for it.

If you can work out how could you please add a unit test for abs() while you're at it. Test at least 0, a positive number, a negative number and the minimum negative for each of I8 and I16.

The test should go in packages/stdlib/test.pony. Add a class called something like _TestAbs, you can copy _TestIntToString as a guide and make sure you add your test to the list in Main at the top of the file (or it won't get run). Each test line should look something like:

h.expect_eq[U8](3, I8(-3).abs())

Shout if you have any problems.

chrisosaurus commented 8 years ago

See https://github.com/mkfifo/ponyc/commit/57d5a52125e8abb3240ce0d08b5ea13c448bf1fb

So after the second fix of changing SignedInteger to take a second param and moving abs onto SignedInteger and UnsignedInteger

trait _SignedInteger[A: _SignedInteger[A] box, B: _UnsignedInteger[B] box] val is Integer[A]
  fun abs(): B
...

trait _UnsignedInteger[A: _UnsignedInteger[A] box] val is Integer[A]
  fun abs(): A
...

I also removed abs from Real

I now get lots of errors, I seem to have broken the relationship between each of the Int types and the Real trait in a subtle way that I haven't worked out yet:

/home/chris/devel/ponyc/packages/builtin/int.pony:3:3: local method from is not compatible with provided version
  fun tag from[A: (Number & Real[A] box)](a: A): I8 => a.i8()
  ^
/home/chris/devel/ponyc/packages/builtin/arithmetic.pony:21:3: clashing method is here
  fun tag from[B: (Number & Real[B] box)](a: B): A
  ^

/home/chris/devel/ponyc/packages/builtin/int.pony:27:3: local method from is not compatible with provided version
  fun tag from[A: (Number & Real[A] box)](a: A): I16 => a.i16()
  ^
/home/chris/devel/ponyc/packages/builtin/arithmetic.pony:21:3: clashing method is here
  fun tag from[B: (Number & Real[B] box)](a: B): A
  ^

/home/chris/devel/ponyc/packages/builtin/int.pony:51:3: local method from is not compatible with provided version
  fun tag from[A: (Number & Real[A] box)](a: A): I32 => a.i32()
  ^
/home/chris/devel/ponyc/packages/builtin/arithmetic.pony:21:3: clashing method is here
  fun tag from[B: (Number & Real[B] box)](a: B): A
  ^

/home/chris/devel/ponyc/packages/builtin/int.pony:75:3: local method from is not compatible with provided version
  fun tag from[A: (Number & Real[A] box)](a: A): I64 => a.i64()
  ^
/home/chris/devel/ponyc/packages/builtin/arithmetic.pony:21:3: clashing method is here
  fun tag from[B: (Number & Real[B] box)](a: B): A
  ^

/home/chris/devel/ponyc/packages/builtin/int.pony:99:3: local method from is not compatible with provided version
  fun tag from[A: (Number & Real[A] box)](a: A): I128 => a.i128()
  ^
/home/chris/devel/ponyc/packages/builtin/arithmetic.pony:21:3: clashing method is here
  fun tag from[B: (Number & Real[B] box)](a: B): A
  ^
andymcn commented 8 years ago

Ah, there's one trivial error in your code, but you appear to have found a compiler bug! So, erm, well done?

The compiler bug is that the type parameter B of _SignedInteger is being conflated with the type parameter B of Real.from(). These should be completely separate from each other, which apparently they aren't. I'll look into this next week, but I suspect it might be a rather complex one. The good news is you can easily avoid it by using a different name for your new type parameter, eg C.

If you make that change then you'll hit your bug. Since you've added a second type parameter to _SignedInteger, whenever you use that type you must provide 2 type arguments to it. In the constraint of the existing type parameter A we reference _SignedInteger, but you haven't changed it to have 2 type arguments.

So everything should work if you change your _SignedInteger code to:

trait _SignedInteger[A: _SignedInteger[A, C] box, C: _UnsignedInteger[C] box] val is Integer[A]
  fun abs(): C

Unfortunately, while the compiler does spot and report your bug as an error, it also reports lots of other errors that arise from that which makes it rather hard to spot. Reducing reporting of secondary errors is an on-going task, I'll add that as a case to look at.

Two other minor points:

  1. We have a policy of a limit of 80 characters per line (to accommodate all reasonable editor / terminal choices). Please break your _SignedInteger line to conform to this. I'd probably break it just after the comma.
  2. I've realised that FloatingPoint should also get an abs(), just as _UnsignedInteger does. Could you add that please. No extra type parameter needed.
chrisosaurus commented 8 years ago

Awesome! Last night I was wondering if it was a compile bug conflating the two Bs

Good catch, I had grepped for every other reference to SignedInteger but had overlooked that one.

Thanks for pointing out FloatingPoint, I have checked and it looks like we have now caught everyplace that the Real.abs would have propogated except for Integer[A] itself, do we need to also add it there? I can't find anywhere that we use is Integer[A] directly (other than for SignedInteger and UnsignedInteger).

I have pushed a new commit addressing both 1 and 2, and all tests are passing.

I will open a PR, please let me know if there is anything else I should change for it to be accepted.

andymcn commented 8 years ago

Great that's merged, thanks.

I don't think we want to add abs to Integer for the same reason we moved it from Real, ie we want to keep the extra type parameter requirement as isolated as possible.

chrisosaurus commented 8 years ago

@andymcn thanks for your help!

fowles commented 8 years ago

Does modulus need similar treatment? Last I looked in C, -3 % 8 == -3, which always surprised me.

chrisosaurus commented 8 years ago

@fowles how would you express that?

UnsignedInteger % Integer => UnsignedInteger
Integer % Integer => Integer

???

these overlap, which I am unsure if that is allowed, if not then

UnsignedInteger % Integer => UnsignedInteger
SignedInteger % Integer => SignedInteger
fowles commented 8 years ago

I would always return Unsigned.

Signed % Integer => Unsigned Unsigned % Integer => Unsigned

andymcn commented 8 years ago

Modulus with negative numbers always surprised me as well. I looked into it recently to check that what Pony does is sensible and eventually came to the conclusion that it is.

The guiding rule we're using is the same one that C uses as of C99:

((a/b) * b) + (a%b) always equals a

This implies that:

So although getting a negative number back from modulus is intuitively wrong, it does actually make sense.

chrisosaurus commented 8 years ago

Currently mod is defined on the Real trait as fun mod(y: A): A => this % y

this is then implemented on all the base number types, this also means that a modulus is only defined between 2 instances of the same type, this follows from pony's philosophy on each number type being distinct.

I think this implementation is correct, both in that it conforms to c, and that it is well typed.

e.g. In pony I128 and U128 are distinct types, so I128 defines mod as (I128, I128) => I128 and U128 defines mod as (U128, U128) => U128

This means you cannot currently mix them, so my earlier comment above about mixing Signed and Unsigned is actually incorrect.