Unsigned integer data types

sjrd commented 8 years ago

By: Sébastien Doeraene @sjrd and Denys Shabalin @densh

tl;dr: We propose the addition of 4 "primitive" types to represent unsigned integers: UByte, UShort, UInt and ULong.

A prototype implementation of this proposal, with unit tests and benchmarks, can be found at https://github.com/scala/scala/compare/2.12.x...sjrd:uints

For the non-tl;dr, read the complete text.

Note that this is probably more a SIP rather than a SLIP, as we did have to touch the compiler one tiny bit (see the implementation), but we post it here anyway.

soc commented 8 years ago

Interesting!

What's the intention regarding existing code in the standard library? How large is the impact when signatures move from signed to unsigned?

sjrd commented 8 years ago

What's the intention regarding existing code in the standard library?

We have listed the potential interactions we could identify under the section Consequences on other parts of the library. Feel free to point out things we have overlooked!

How large is the impact when signatures move from signed to unsigned?

I don't think I understand the question. We have no intention of changing anything. The existing APIs in the standard library should be kept intact. Unless there were already places in the std lib where unsigned integers would have been much more appropriate in the first place, but we did not identify any so far. (Anyway, such changes would most likely go against the backward source compatibility "promise" of 2.12.)

non commented 8 years ago

I think the overloads of << dealing with Long are not correct.

In Scala, 33 << 33L produces 283467841536L (a Long value), whereas in this proposal UInt(33) << 33L would produce 66 (a UInt value). My sense is that any value shifted by a Long should produce a ULong.

sjrd commented 8 years ago

In Scala, 33 << 33L produces 283467841536L (a Long value)

That is a bug that has been fixed in 2.12.0-M1: https://issues.scala-lang.org/browse/SI-8462

sjrd commented 8 years ago

Speaking of which, there is still another related bug, which has not been fixed yet: https://issues.scala-lang.org/browse/SI-9516

non commented 8 years ago

Aha! Are you working on a fix for SI-9516? Would it just target 2.12.x or 2.11.x as well?

sjrd commented 8 years ago

I am not working on a fix right now, but I will if the ticket does not raise more attention from the scalac team. The target would depend on Typesafe's policy for the 2.11 branch; I suspect it will target 2.12. Anyway, this is off-topic, here. Any follow up discussion on this should be on the relevant JIRA ticket.

mpilquist commented 8 years ago

:+1:

non commented 8 years ago

yeah :+1: on unsigned types from me too

He-Pin commented 8 years ago

:+1: can't wait.

simonorono commented 8 years ago

:+1:

Ichoran commented 8 years ago

Sorry to be the grump on this one, but this proposal still leaves a lot to be desired. First, and most importantly, the two motivating examples are examples with arrays. But this proposal doesn't deal with arrays, it just adopts the typical box-everything AnyVal strategy. It is true that buffer(i).toChar won't do the right thing, but if you're futzing with individual bytes in an array, you probably have some good reason that includes it not being 20x slower than you expect. Thus, claiming the types are "primitive" in any important sense (at least in the motivating examples) is incorrect. You can get primitive behavior if you keep firmly in mind that these are standard AnyVals and all the normal rules apply, but that's different.

Secondly, the integration into the standard library is poor. For example, collections cannot meaningfully have negative size, yet if you happen to have a UInt around, you have to convert to a signed int in order to index in (?!). It isn't Numeric (yet--I think this is an error, explained below). If you do happen to use them in immutable collections, you'd better remember to .toUWhatever on every single constant or you'll widen to AnyVal. Specialization? Nope, so your functions will all be slow. Is there a UByteBuffer? What about uintValue methods on BigInt? Overall, it's clunky and weird--feels very much like a poorly bolted-on addition, not something with first-class support. If you don't promise that it's a "new primitive type" this isn't an issue, but you try to and fall short.

Thirdly, you've inserted code into some very performance-critical parts of the library (critical enough to explain why Scala's HashMaps are 2x worse than Java's, for example), and simply stated that the JVM can take care of it. In principle, yes. In practice? Something as innocent as a change in the size of bytecode can have an impact on performance. You need to test this thoroughly (e.g. by doing JMH benchmarks +- just the part of the patch that changes common code in equality and hashing). And if someone does actually use this, it's probably good for them to know how big of a hit they will take.

Fourthly, if you allow 0.toUInt - x, forbidding -x at the cost of losing Numeric compatibility seems rather steep (for a "new primitive type"). And the choice to not have it be ScalaNumeric is also a bit strange--that mechanism is already there; why not use it?

Now, I'm happy to have a library module that provides support for unsigned math, and it's perfectly fine for that to come in the form of value classes. But I don't think this proposal is a good thing for Scala, as it promises more than it can deliver, and makes one of the less elegant parts of Scala (i.e. friction between primitive and object types) considerably less elegant.

sjrd commented 8 years ago

First, and most importantly, the two motivating examples are examples with arrays.

That is a) false for the first example and b) irrelevant for the second one. The first example deals with JavaScript's Uint32Array, which is not a scala.Array. Uint32Arrays won't have their UInts boxed; they will be unboxed (in Scala.js primitive types are not boxed; unsigned integers would not either). The second example illustrates manipulations of bytes. The use of Array here is completely irrelevant to the example; we could have used Vector or Seq or what have you instead.

For example, collections cannot meaningfully have negative size, yet if you happen to have a UInt around, you have to convert to a signed int in order to index in (?!).

Would you then consider adding an overload of apply in Seq? That seems pretty risky to me, but we sure can debate this. How often do you actually index into a Scala collection? Except when iterating over the indices of an Array, which you will get through a Range anyway, and hence they will be signed? I think the risks of adding new overloads to Seq.apply outweigh the benefits for the rare cases where you actually want to index with an unsigned integer.

If you do happen to use them in immutable collections, you'd better remember to .toUWhatever on every single constant or you'll widen to AnyVal.

As we explain in User-space vs compiler-space, we believe the lack of weak conformance treatment for unsigned integers will not be a problem. That said, nothing prevents from adding weak conformance for unsigned integers in the compiler, if that is deemed desirable.

Specialization? Nope, so your functions will all be slow.

Indeed. But this could be improved in the future. Again, we discuss this in User-space vs compiler-space. We believe this is out of scope ATM, though.

If you don't promise that it's a "new primitive type" this isn't an issue, but you try to and fall short.

We can remove the primitive type "brand" if you want. It doesn't really matter.

You need to test this thoroughly (e.g. by doing JMH benchmarks +- just the part of the patch that changes common code in equality and hashing).

You're right. We'll do that.

Fourthly, if you allow 0.toUInt - x, forbidding -x at the cost of losing Numeric compatibility seems rather steep (for a "new primitive type").

Right. We first took the "no unary_-" as an obvious decision, and then followed that it should not be Numeric. We also thought it did not make sense for it to be Numeric, assuming Numeric represents values that have opposites. But that may not be the appropriate assumption, and we can reconsider this.

And the choice to not have it be ScalaNumeric is also a bit strange--that mechanism is already there; why not use it?

Are you talking about scala.math.ScalaNumber? That is a class. It cannot be extended by value classes. Want to make it a universal trait? Still does not help, because value classes cannot redefine equals and hashCode.

Now, I'm happy to have a library module that provides support for unsigned math, and it's perfectly fine for that to come in the form of value classes.

Unfortunately, that does not solve the interoperability problem. If it did, I'd have done so months ago, and not bothered writing a SLIP in the first place.

lutzh commented 8 years ago

:+1:

sjrd commented 8 years ago

Another issue with Numeric is what to do about ULongIsNumeric.fromInt(x). In Explicit conversions between signed and unsigned we argue that signedInt.toULong should not exist because it can be interpreted in two different ways. ULongIsNumeric.fromInt(x) has precisely the same issue.

non commented 8 years ago

@sjrd Regarding unary_-, I agree with @Ichoran.

The current implementation defines UInt(0) - UInt(9) as the two's-complement value for -9, i.e. 4294967287. If 0 - x is defined for all x, then -x can be defined to produce the same value as 0 - x. This definition is consistent and will not surprise anyone who understands how subtraction works with unsigned values.

(It will surprise someone who expects 0 - 3 to throw an error, but those folks will already be surprised by the current definition of subtraction.)

Ichoran commented 8 years ago

@sjrd - I had thought that, because you wanted interop, you'd want to support something equivalent to Uint32Array on the JVM. If you don't care to do that (so that you can have the same code on both systems), then I agree that it's not an example of something that'd be boxed, but it's also not motivating for anything on the JVM. That suggests only, "do something that works on Scala.js", and you haven't motivated why that needs any JVM library changes. If ScalaNative materializes that'd be very exciting, but I am deeply skeptical that this proposal makes an unsigned type workable there where otherwise it wouldn't be. I'm happy to be convinced otherwise.

The problem with the second array example is that it's a good example precisely because getting signed bytes out of an array of bytes and converting them to Char is something that you very plausibly want to do. Why would you want to put bytes (signed or unsigned) into a vector? Yes, you could, and if your code just needed to be right, not fast, it'd be fine. But I don't see many applications where people need to deal with bytes but don't care how slow it is. And anyway, you've done a bunch of benchmarking to show that these things are fast on the JVM! That's great! Except it's not true if someone puts them into an array.

You could very well implicitly add an apply(index: UInt) to collections, but you can't so easily fix the return value of length and size and count. And drop and take and so on also suggest unsigned arguments, but catching everything is a huge amount of work. I don't think you really need to. My complaint is that you're trying to integrate unsigned types too much, and it's not working. If you hold them more at arm's length, then few people would expect to be able to do those things anyway. (To answer the specific question: I index into collections that support fast indexing whenever it's necessary to support a more advanced operation than the collections libraries easily support. For instance, traversing a pair of collections with occasional one-element lookbehind is way easier with indexing than with zip + sliding.)

I agree that no unary - is an obvious decision, and that fromInt methods have two sensible interpretations, but the more you integrate unsigned numbers, the more glaring of an omission it is to not be able to do math generically on unsigned numbers. Again, the problem is trying to make the coupling tight when the library can't really support it. I think Numeric needs to be thrown out and redone a la Spire anyway, so this is less of an objection if we get a SLIP for that in too. As is, it's awkward.

You're right, of course, about ScalaNumber. But doesn't this just point out that value classes are not actually up to the job of defining a new number type? Having to bake this in to the library and possibly partially into the compiler seems to break the standard methodology of favoring generic solutions that enables anyone to do something over special cases hard-coded into the language/compiler/whatever.

If a looser coupling does not solve the interoperability problem, maybe the SLIP ought to explain in more detail exactly what is required. My instinct would be that things that act like UByteOps can be fully handled by the value class system, and if you happened to put them into an Array, it wouldn't be terribly surprising that they were boxed and didn't compare properly with other numbers (because they're conceptually the capacity to do operations on a value, not just the value itself). I don't know precisely what mechanics you use on the .js side, but at least conceptually having a Uint32Array return UIntOps doesn't seem problematic. The normal value class mechanism will make sure that wrapped values of the same type will equal each other and have the same hash codes. You won't get equality across types, but you already don't get assignment across types, so it doesn't seem a big deal to me.

If ScalaNative is a big driver of this, I'd need a lot more information about what it's going to be to be comfortable that this is even a step in the right direction.

sjrd commented 8 years ago

@non Indeed, that makes sense.

paulp commented 8 years ago

These conversations quickly overwhelm the dilettante's willingness to wade through them, but in case it hasn't been mentioned, you can only value-class wrap once. The closest thing to newtype offered by scala is a value class. Where you want to use newtype is to turn a data representation into a data type so that it is not interchangeable with every other type which uses the same representation.

If you implement unsigned numerics as value class wrappers, you consume the entire budget for newtype. Now every unsigned int is interchangeable with every other unsigned int. "Unsigned int" is a representation, not a type - even if it is implemented on top of another representation - and should be presented as such.

paulp commented 8 years ago

I can't find the string "lub" in either the proposal or the comments. What's the type of this?

def f(x: Int, y: UInt) = if (condition) x else y

I'm guessing it's AnyVal? I hope nobody seriously entertains the idea that AnyVal is an acceptable type to entrench yet further into scala core.

sjrd commented 8 years ago

I had thought that, because you wanted interop, you'd want to support something equivalent to Uint32Array on the JVM.

You are confusing interoperability and platform independence. Platform independence (or being cross-platform) is the ability to write the same code and have it compile and run the same way on several platforms (e.g., JVM and JS). Interoperability is the ability to communicate with libraries written in a different language, typically a host language (e.g., JavaScript).

This proposal is motivated by interoperability with libraries written in the host language that demand or output unsigned integers. The example illustrates that with Uint32Array, which is a class in ECMAScript allowing to manipulate off-heap arrays of unsigned 32-bit integers (e.g., to communicate with WebGL).

One could of course port (part of) the API of Uint32Array on the JVM to make it cross-platform; and it could do so without boxing by internally using an Array[Int] or a (direct) java.nio.IntBuffer. Rough sketch:

final class Uint32Array private (private val underlying: java.nio.IntBuffer) {
  def this(length: Int) = this(java.nio.IntBuffer.allocateDirect(length))
  def length: Int = underlying.capacity()
  def apply(i: Int): UInt = underlying.get(i).toUInt
  def update(i: Int, v: UInt): Unit = underlying.put(i, v.toInt)
  def set(source: Uint32Array, offset: Int): Unit = {
    underlying.position(offset)
    source.underlying.position(0)
    underlying.put(source.underlying)
  }
}

That would also provide an efficient array of UInts on the JVM. But that is mostly orthogonal to the present proposal.

That suggests only, "do something that works on Scala.js", and you haven't motivated why that needs any JVM library changes. [...] I don't know precisely what mechanics you use on the .js side, but at least conceptually having a Uint32Array return UIntOps doesn't seem problematic.

On the contrary. Mechanically we certainly can have the Scala.js compiler twist the meaning of some UIntOps so that it "works", but conceptually it's all broken. To be able to interoperate with host language libraries, we need to be able to say in the spec: a UInt is a JavaScript number that happens to be an integer in the range [0; 2^32). In parallel, we have that an Int is a JavaScript number that happens to be an integer in the range [-2^31; 2^31). Now of course, 5 needs to be == to 5.toUInt, since they both are a JavaScript number with the value 5.0.

If 5 != 5.toUInt, then 5.toUInt simply cannot be a JS number, and therefore there is no possible interoperability.

I don't think you really need to. My complaint is that you're trying to integrate unsigned types too much, and it's not working. If you hold them more at arm's length, then few people would expect to be able to do those things anyway.

I do not see how I can integrate them less without loosing universal equality.

But doesn't this just point out that value classes are not actually up to the job of defining a new number type?

Maybe ... Should we enhance value classes to allow redefining equals/hashCode and turn ScalaNumber into a universal trait instead?

sjrd commented 8 years ago

you can only value-class wrap once

This need not always be the case. It is an implementation restriction. Dotty demonstrates that you can have several nestings of value classes, IIRC.

I can't find the string "lub" in either the proposal or the comments.

Nope, but we've mentioned "weak conformance", both in the SLIP and in the comments.

What's the type of this? def f(x: Int, y: UInt) = if (condition) x else y I'm guessing it's AnyVal?

Yes, it is.

paulp commented 8 years ago

"Value classes crippled by nesting restriction" but ok, thanks dotty. Be that as it may, in scala you can only wrap once. I'd think the proposal has to be in terms of scala as it is, not scala as it may someday theoretically be - and there isn't any evidence that anyone has a plan to lift that implementation restriction in scala.

As to "so what?" I'm not going to reiterate everything I've ever written about the uselessness of AnyVal. It's all out there if someone else wants to carry the ball.

[edit: The comment to which I was replying was edited and rendered my reply insensible. It used to say "Yes, so what?" where it now says "Yes, it is."]

bvenners commented 8 years ago

The current implementation defines UInt(0) - UInt(9) as the two's-complement value for -9, i.e.

If 0 - x is defined for all x, then -x can be defined to produce the same value as 0 - x. This definition is consistent and will not surprise anyone who understands how subtraction works with unsigned values.

I think this behavior will be very error prone. People don't usually work with Ints near Int.MaxValue or Int.MinValue, but they do usually work with them near 0. What I did in PosZInt, which has the same values as UInt, is pop back to Int if you subtract. You can see that in the signatures of the minus (-) methods here:

http://www.artima.com/docs-scalactic-3.0.0-M11/#org.scalactic.anyvals.PosZInt

Here's an example:

scala> PosZInt(0) - PosZInt(3)
res2: Int = -3

Anytime I couldn't prove at compile time that the result would be another PosZInt or PosInt, I pop back to Int. This turns out to be most of the time, so these types are not designed to facilitate unsigned arithmetic so much as to guarantee in the type that a value adheres to some constraint.

If you implement unsigned numerics as value class wrappers, you consume the entire budget for newtype.

The way I deal with that problem is offer a helper trait, CompileTimeAssertions, so people can more easily make their own "newtypes" for whatever else they need:

http://www.artima.com/docs-scalactic-3.0.0-M11/#org.scalactic.anyvals.CompileTimeAssertions

The main idea is to facilitate the removal of requires on input where it is practical, so transform the potential for run-time errors into compile-time errors.

I'm curious, what exactly is the problem that this SLIP is trying to address? I myself did think it was odd that Java didn't have unsigned primitives given I was coming from C and C++, but it turned out I didn't really miss them in all my Java days. Why they were left out did come up at one point when I was interviewing Gosling. He said people didn't understand how unsigned arithmetic worked in C. This is in the second to last paragraph on this page:

http://www.artima.com/intv/gosling33.html

The problem that Scalactic AnyVals is trying to address is to get rid of requires, especially in situations where people express the values as literals much of the time.

sjrd commented 8 years ago

I'm curious, what exactly is the problem that this SLIP is trying to address?

Interoperability, first and foremost.

bvenners commented 8 years ago

Interoperability, first and foremost.

Do you mean between Scala on the JVM and JS?

sjrd commented 8 years ago

No, that's cross-platform (see my earlier comment about the distinction). See also the first of the two Motivating Examples.

Ichoran commented 8 years ago

@sjrd - Okay, now that I am clear on what you mean by "interoperability" vs. "cross-platform", I ask: why are you doing anything to the JVM version of Scala? Just add those types to the Scala.js version, apply the patch here to RichInt and equalsNumNum and so on, and problem solved for Uint32Array on Scala.js.

bvenners commented 8 years ago

Yes, I also think I see the interoperability need now for unsigned types in Scala.js. I think the benefit of having this be part of the Scala language and/or the standard library is that you can compile and run the same Scala code that takes advantage of unsigned types on the JVM and JS. If it is just part of Scala.js, then you can use it under JavaScript, but can't take advantage of it when writing code you want to use on both the JVM and JS.

I'm curious if the behavior of unsigned in JavaScript is different in any way from the behavior in C. Because I would think the old C behavior would be the starting point for designing unsigned type behavior on the JVM. There are already differences between JS and JVM with regards to numerics. For example we discovered when testing these PosInt types that if you divide an Int 1 by and Int 0, you get not an ArithmeticException as on the JVM, you can NaN, which is a Double.

WHAAAT!?!?!?!

So we had do things like this in our tests, which from an old Java guy's perspective seemed crazy:

https://github.com/scalatest/scalatest/blob/master/scalactic-test/src/test/scala/org/scalactic/anyvals/PosZIntSpec.scala#L42

It would be easier to decide on unsigned semantics if this were just available on Scala.js. The semantics should be the JavaScript semantics. But then cross-platform code couldn't use those types. To enable cross-platform code to use unsigned, which is what this SIP/SLIP is proposing I think, I would hope the behavior (and performance) would make sense on both platforms.

sjrd commented 8 years ago

@Ichoran - Ahah! Now, that is an excellent question, indeed. And writing up the answer makes me realize: it's for platform independence ^^ (ironically)

Initially, that is what I wanted to do: add new types in the scala.scalajs.js namespace to represent interoperable unsigned integers. Spec in Scala.js that they are == to their signed equivalent, patch BoxesRunTime.scala (which is already heavily patched anyway), and call it a day (well, a week, most like).

But then, I realized those data types can be useful in and of themselves (not just for interop). I also talked to @densh, who agreed that unsigned integer types would be useful for interop in ScalaNative. So, we set ourselves to the task of polishing this idea and make it work for all Scala platforms, consistently and efficiently. And that's how this SLIP was born.

So yes, in fact this SLIP is all about cross-platform, although it was born from a desire for interoperability. In a sense, it addresses both.

If the SLIP is deemed undesirable on Scala/JVM and rejected, then I will go back to my old idea. It would be unsatisfactory, because Scala.js would suddenly have more "primitive" types than Scala/JVM, but it would work.

sjrd commented 8 years ago

@bvenners - I think you got 0, not NaN, but it's still WHAT (well, at least it's an Int). But that's because it's undefined behavior in Scala.js. And every passing month I'm reconsidering this, so maybe eventually we'll specify it anyway.

Regarding unsigned integer semantics, all semantics for JS/JVM/C are strictly equivalent (modulo the undefinedness of / 0). The only weird case is >> (it does not do the same thing in C), which we excluded from the UInts API precisely for that reason.

bvenners commented 8 years ago

@sjrd I know we got NaN somehow, that Ints magically transformed to Doubles at runtime! It was a property-based test failures, so I'll see if I can confirm which values got us that outcome.

Good to hear that the semantics other than divide by 0 are identical between JS and C. JVM doesn't have any semantics, does it? Or are you referring to the Java 8 support you mentioned? Anyway, that sounds good.

My reason for bringing up divide by zero is that Scala is not write-once-run-on-JVM-or-JS. That's why it took us so long to port ScalaTest to Scala.js. But I would hope that new additions to both platforms, such as unsigned types, would work the same on both (other than perhaps divide by 0).

sjrd commented 8 years ago

JVM doesn't have any semantics, does it? Or are you referring to the Java 8 support you mentioned?

Yes, I am referring to the Java 8 support.

Ichoran commented 8 years ago

It sounds like a big missing piece in this is "interop in ScalaNative". That was apparently a key motivating factor, but most of us (e.g. me) don't know enough about it to reason about anything on the basis of it. Some important things to consider include:

Will it surely exist, or only maybe?
Does it need first-class or only second-class support for unsigned types?
What is it interoperating with? C? What about much more terrifying things like pointers and memory ownership?

As I said before, I'd like to have an easy way to do unsigned math on the JVM. This proposal implements that, and that part of it is great. It's the attempt at integration that I don't like because it digs deep into the internals underlying the standard library, and still fails to feel built-in due to all the bits not supported (literals, weak conformance, efficient arrays, specialization, ability to value-class it, etc.).

sjrd commented 8 years ago

I will let @densh speak for ScalaNative.

and still fails to feel built-in due to all the bits not supported (literals, weak conformance, efficient arrays, specialization, ability to value-class it, etc.).

Except for literals (which is not a requirement to be primitive in Scala, see Byte and Short), all of this is supposed to be addressed in the "long term" compiler-space implementation. The short term implementation lays down the ground work in terms of semantics. (Well, weak conformance is semantics. We could implement it short-term too.)

I understand your point, though. Let me ask you something differently, then. Let's suppose we change the role of the current implementation as a "prototype", and we change the goal of this SLIP/SIP to be "it must reach the long-term vision now" (or before anything gets in, at any rate), in which they truly are primitives all the way down. In that context, would you change the way you look at this proposal?

bvenners commented 8 years ago

Yes, I am referring to the Java 8 support.

I had to look this up. That addition to Java 8 had passed under my radar:

https://blogs.oracle.com/darcy/entry/unsigned_api

Given that API in Java 8 it does seem useful to me to add UInt, etc., types to Scala proper 2.12 or later. I think it would still be useful even if they are just plain old AnyVals with cooperative equality support added between the built-in AnyVals Int, Long, etc. Array[UInt] can just be boxed, and if that's a performance problem, we could switch to Array[Int]. And if you wanted to newtype on top of UInt, just make a new AnyVal on Int and call the same unsigned Int methods on some singleton object somewhere . So it would be nice if these unsigned Int arithmetic methods that take Int were exposed on an object. Of course it would be better if the unsigned types didn't box when put in specialized collections like Array, but I think they would be still useful even if they are just plain old AnyVals.

Ichoran commented 8 years ago

@sjrd - In my case, yes, absolutely. My primary complaint is that it feels hacked on, not like a core part of the language. If the Java library doesn't support it--well, that's unfortunate. But if it's genuinely first-class in Scala, that'd be great. Not so great, though, if there's a big performance hit associated with it. We'd have to think that one through carefully. Assuming it can be avoided, it'd be good.

I honestly don't know how to do that, though, given how Array works on the JVM, and given the current design of value classes. (Well, I kind of do, but changing value classes to box arrays as well is probably a bigger change than this is.)

DarkDimius commented 8 years ago

@paulp

you can only value-class wrap once

AFAIK this restriction is removed in Dotty

DarkDimius commented 8 years ago

@sjrd

Maybe ... Should we enhance value classes to allow redefining equals/hashCode

https://github.com/lampepfl/dotty/pull/717

DarkDimius commented 8 years ago

@bvenners

I had to look this up. That addition to Java 8 had passed under my radar:

https://blogs.oracle.com/darcy/entry/unsigned_api

It's non-intrinsified by hotstop and I would not expect it to be intrinsified any time soon.

bvenners commented 8 years ago

It's non-intrinsified by hotstop and I would not expect it to be intrinsified any time soon.

Now I had to look up intrinsified. Good word.

Anyway that's fine. Sounds like this may not be so much about performance, but just about giving people a more user-friendly way to access unsigned integer arithmetic in Scala. Given those methods in the Java 8 API, and the unsigned types in JavaScript, having a few AnyVals for unsigned math in Scala seems natural. I wonder how often that will be useful on the JVM if the performance is not as good as the corresponding-to-primitive value types, though. If they are going to be second class performers, perhaps they should not enjoy co-operative equality with the corresponding-to-primitive value types, like PosZInt:

scala> PosZInt(0) == 0
<console>:17: warning: org.scalactic.anyvals.PosZInt and Int are unrelated: they will never compare equal
       PosZInt(0) == 0
                  ^
res4: Boolean = false

scala> 0 == PosZInt(0)
<console>:17: warning: comparing values of types Int and org.scalactic.anyvals.PosZInt using `==' will always yield false
       0 == PosZInt(0)
         ^
res5: Boolean = false

Especially if adding support for them will slow down the performance of equality comparisons on the corresponding-to-primitive types.

paulp commented 8 years ago

AFAIK this restriction is removed in Dotty

@DarkDimius Yes @sjrd also pointed this out, and also included a hedging acronym. The current tally is one AFAIK and one IIRC. Assuming that you do KF/RC, what significance should we find there? Can anyone elaborate on what role we should expect dotty to have in S[cala]LIPs, and/or point me to a document where it is articulated? AFAIK/IIRC news items regarding rust, julia, haskell, or php are equally as relevant to this process as those regarding dotty.

densh commented 8 years ago

It sounds like a big missing piece in this is "interop in ScalaNative" ...

Will it surely exist, or only maybe?

That's what I'm currently working on. Unfortunately, I can't really share any concrete information about the project until the public announcement time.

Does it need first-class or only second-class support for unsigned types?

Second-class types are going to be enough for the start. All operations that are implemented though static methods of java.lang.{Integer, Long} can be intrinsified to remove overhead in problematic areas like division.

The idea of the SLIP is to do things in two-phase manner: first introduce basic support and only then see if more advanced integration is even worth it. The problem with integrated version is the fact that much more things will have to change in the compiler to accommodate the new functionality which is quite a high risk job.

Given enough interest, this proposal can become an excellent benchmark to fixing current issues with values classes that were mentioned in this thread:

special treatment of null unboxing
inefficiency of arrays of value classes
inability to redefine equals and hashCode
incompatibility with specialization
inability to wrap twice

In my opinion fixing all of these problems might be a better thing for Scala than having a few more truly primitive integer types.

What is it interoperating with? C? What about much more terrifying things like pointers and memory ownership?

Yes, foreign function interface to C. Although the task might seem daughting at first, most of the things can be quite nicely integrated. Even on JVM as long as we still have sun.misc.Unsafe, we can do a much better job then JNI (as my experience with scala-offheap shows.)

DarkDimius commented 8 years ago

All operations that are implemented though static methods of java.lang.{Integer, Long} can be intrinsified to remove overhead in problematic areas like division.

Can be intrinsified by HotSpot developers, not by us. I do not think that this is high on their priority list. On our side I do not see much that we can do.

densh commented 8 years ago

Can be intrinsified by HotSpot developers, not by us. I do not think that this is high on their priority list. On our side I do not see much that we can do.

I was talking about ScalaNative, not JVM.

Ichoran commented 8 years ago

@densh - If you can't really talk about ScalaNative, people can't really be expected to look upon this SLIP with more favor because of it. Unsigned types are high on the list of things I'd expect for native support, but these days so are vectorized instructions. The JVM does the non-vectorized stuff pretty well, but it's routinely clobbered by vectorized code. And it's hard to see this proposal getting us any closer to that, though maybe there's a way to get tuples to do it. So details really do matter.

I don't mind a multi-step process; I'd just make two different steps. The first one less far so that we gain access to unsigned operations without having bolted-on quasi-primitive types. Then, assuming that works out and more is desirable, go for the whole thing and add first-class primitive types with everything that entails.

I fully agree that fixing all the problems raised may well be better for Scala than having unsigned types. So let's fix some!

bvenners commented 8 years ago

I don't mind a multi-step process; I'd just make two different steps. The first one less far so that we gain access to unsigned operations without having bolted-on quasi-primitive types. Then, assuming that works out and more is desirable, go for the whole thing and add first-class primitive types with everything that entails.

If this might be a two-step process than in step one I think you'd want to have the cooperative equality in step one, because you couldn't add that on in step two without breaking everyone's code. And it wouldn't feel like a first-class value type without that.

Ichoran commented 8 years ago

@bvenners - You would only break code that assumes that UIntOps doesn't correctly compare with Int when boxed.

Anyway, let's see the benchmarks. I'm pretty sure that there will be a significant hit (I'd guess 20% even in code that never uses UInt, in some but not all cases, and closer to 50% when you actually need to support UInt). You arguably shouldn't be having boxed numbers when you care about performance, but if you do get a hit without using it, that's a problem. We already suffer from the elaborateness of our equality method, and "be really careful you never, in your entire code base, do this natural thing that we went out of our way to support" is not very friendly story for optimization. To get it right/fast on the JVM we'll probably need something like miniboxing. Not sure what the JS engines will benefit from.

bvenners commented 8 years ago

@Ichoran I just meant that if 3.toUInt == 3 is false in 2.12, we would not be able to change that result to true in 2.13. And I agree that the performance cost of truth should guide that decision. It may turn out that we can't handle the truth!

Ichoran commented 8 years ago

@bvenners - If you have one typed as UInt and the other typed as Int, the compiler will warn you that your comparison will never succeed. It's only when they're typed as AnyVal or Any that you run into trouble.

scala / slip

Unsigned integer data types #30