twosigma / beakerx

Beaker Extensions for Jupyter Notebook
http://BeakerX.com
Apache License 2.0
2.8k stars 382 forks source link

Make Scala interfaces more idiomatic #5798

Open jpallas opened 7 years ago

jpallas commented 7 years ago

This is an umbrella for a couple of things I'd like to do for the Scala-specific interfaces in the chart, table, easyform, and fileloader packages.

  1. Change all List interfaces to use Seq. Seq is the generic type for an ordered collection, and will accept Scala Array, Vector, and Range (as well as List).
  2. Use Any instead of AnyRef where possible on input parameter types to avoid problems with using the Scala native types. Scala should handle auto-boxing of native numeric types.
  3. Add companion object interfaces for construction. It's more common in Scala to overload the companion apply than to overload the constructor, and even without overloading, the advantage of not typing new all over the place in a notebook is clear. Drawback: It will be awkward if some classes don't have companion-based constructors, so new Scala code may be needed even if the Java classes involved have Scala-friendly types. Impact: some maintenance load for Scala compatibility.
  4. Compile-time type safety. This is tricky. The base classes are implemented in Java, which doesn't have any way to express constraints like "Date or Number", so they fall back to type Object and rely on run-time checking with exceptions. Scala can use type constraints to be more specific, and Scala programmers are used to having compile-time type checking be as strict as possible. Major concerns:
    • Test code would have to written in Scala instead of Java
    • The constraint handling code would need to change if the underlying constraints changed (e. g., support for java.time.LocalDateTime in XYGraphics.setX)
    • This part of the code, while relatively isolated, would probably be opaque to anyone not fluent in Scala

So, I'm very much open to feedback on this. I think the first two items are low-cost/moderate-benefit. The others may be debatable, especially in terms of their impact on maintaining the Scala code when changes are made to the base libraries.

scottdraves commented 7 years ago

Hi Joe, thanks for the suggestions. We have scala wrappers (https://github.com/twosigma/beakerx/blob/master/kernel/scala/src/main/java/com/twosigma/beakerx/scala/chart/xychart/plotitem/Points.scala) and our goal is to make it as idiomatic as possible. We just took a guess to get started. We are looking into this further....

icexelloss commented 7 years ago

@scottdraves asked me to take a look at the issue. My understanding is that we want to provide Scala wrappers for java packages chart, table, easyform.

First of all, I am new to this problem so apologies if I say something stupid.

@jpallas For you changes you proposed, 1-3 seems fine. 4 is not clear to me, for instance, I don't know what do you mean by:

The base classes are implemented in Java, which doesn't have any way to express constraints like "Date or Number"

and

The constraint handling code would need to change if the underlying constraints changed (e. g., support for java.time.LocalDateTime in XYGraphics.setX)

Can you give a bit more details?

jpallas commented 7 years ago

@icexelloss It's certainly reasonable to ask why Scala merits a custom API for these classes. The answer to your question "how does it look" using the Java interfaces is that it would look pretty ugly. Explicitly converting between Java collections and Scala collections would clutter up Scala cells in notebooks quite a bit. Some Scala users, especially ones who are primarily using Spark, may not be familiar with the Scala/Java conversion methods if they haven't worked with Java libraries from Scala. The error messages when conversions are omitted would, in some cases, be confusing or even misleading, since some of them would appear as run-time exceptions thrown by the implementation rather than compiler messages. (An exception message complaining that a "list of numbers" was expected will be very confusing if the user did, in fact, pass a Scala List[Int], for example.)

I'll post a separate comment on the questions about type-safe interfaces.

icexelloss commented 7 years ago

@jpallas, thanks for the explanation. I agree calling explicit conversion asJava in a notebook is ugly.

How about using Scala implicit conversions such as:

http://www.scala-lang.org/api/2.12.1/scala/collection/convert/ImplicitConversionsToJava$.html ?

Can you post maybe an example of using the Java API+ Implicit Conversion vs Native Scala API?

scottdraves commented 7 years ago

We have the resources to make a good native API, no need to take a shortcut unless the result is just as good. Of course if there is a faster/simpler way with the same results, then by all means let's do it.

jpallas commented 7 years ago

As noted in the docs,

It is recommended to use explicit conversions provided by collection.JavaConverters instead. Implicit conversions may cause unexpected issues.

Implicit conversions won't handle cases where conversions need to be nested. A good example is the TableDisplay constructor that takes a Collection<Map<String, Object>>. Implicit conversions don't chain, so this case (or List<List<?>> can't be handled implicitly.

icexelloss commented 7 years ago

Yeah this makes sense. I think we should have a Scala native API then.

scottdraves commented 7 years ago

Thanks for your input @icexelloss

@michalgce please get started making a PR for each of these points in order. ask questions here as needed.

jpallas commented 7 years ago

Some more details on the issue of static types for the Scala interfaces. (Earlier I said "type safety", but I meant specifically static types—the JVM will throw runtime exceptions for type errors.)

Consider XYGraphics, which is the parent for Line and Points (among others), and declares setX and setY. setY takes a List<Number>. This is a reasonable Java interface. It's inconvenient to use directly from Scala because the Scala native types are not derived from Java's Number, so even after using the asJava conversion on the collection, a Scala List[Int] won't conform to Java's List<Number>. (Note: although Scala's Int is not java.lang.Integer, the boxed representation is. That means casting a List[Int] to a List[Number] for a call to Java will work, although it is technically not safe, relying on an implementation detail that is not expressed in the type system.) The type-safe conversion from Scala List[Int] to Java List<Number> uses the implicit conversion (or view) from Int to java.lang.Integer. This can be done generically by defining a method with a view bound:

import scala.collection.JavaConverters._
…
  def setY[T <% Number](ys: Seq[T]): Unit = {
    val javaList = ys.map(y => y: Number).asJava
    setY(javaList)
  } 

(View bounds syntax is deprecated, however, so the above should actually be written as def setY[T](ys: Seq[T])(implicit conversion: T => Number)).

The story for setX is more complicated. Java can't express a type of List<Number | Date>, so the interface uses List<Object>. The implementation does runtime type checking to see whether each item is a Number or a Date. Scala context bounds can be used to constrain a generic type, so it can express "List of things of some type that satisfies a date-or-number constraint". Technically, this is not exactly what the implementation supports: the Java code will accept a list that contains a mix of numbers and dates, while the Scala constraint requires that the list be of a uniform type (that is, all numbers or all dates). We're unlikely to generate a heterogeneous collection in practice, though.

We can use the "type class" pattern to define a constraint such as JavaDateOrNumber. Then a type-safe implementation of setX might look like this:

  def setX[T : JavaDateOrNumber](xs: Seq[T]): Unit = {
    import JavaDateOrNumber.converter._
    val javaList = xs.map(y => y.asDateOrNumber).asJava
    setX(javaList)
  }
jpallas commented 7 years ago

Time for an update on this. (Caution: some ranting ahead, with possible whining about dynamically typed languages like Groovy.) The first pass was naive, because I had no idea how thoroughly unhelpful some of the interfaces are. To be specific, I'm talking about things like this:

  public void setColor(Object color) {
    if (color instanceof Color) {
      this.baseColor = (Color) color;
    } else if (color instanceof java.awt.Color) {
      this.baseColor = new Color((java.awt.Color) color);
    } else if (color instanceof List) {
      @SuppressWarnings("unchecked")
      List<Object> cs = (List<Object>) color;
      setColors(cs);
    } else {
      throw new IllegalArgumentException(
              "setColor takes Color or List of Color");
    }
  }

Seriously, WTF? I don't really understand why this was done the way it was. I assume it was written for Groovy. I thought that Groovy is supposed to interoperate with Java sensibly around overloaded methods, and I think this is meant to be equivalent to a method with four overloads:

public void setColor(Color color) { ... }
public void setColor(java.awt.Color color) { ... }
public void setColor(List<Color> colorList) { ... }
public void setColor(List<java.awt.Color> colorList) { ... }

[EDIT: but the last two can't both be present due to type erasure]

Obviously, the first problem is that you have to read the implementation carefully to learn what the interface type is. The second problem is that the declared method takes Object, so type errors are impossible. That makes one approach for wrapping the interface in Scala, using implicit classes for extensions, unworkable because extension methods will only be considered if the call would otherwise fail to type-check. (Not sure if that problem also affects Kotlin, since I'm not familiar with its Java interop behavior.)

So, short of splitting a bunch of existing methods that do runtime type-tests into overloads, I think the easiest path forward for Scala is to create those overloads in Scala classes that inherit from the Java classes. That will still leave the untyped methods visible to Scala callers, but there's no way to hide them and preserve the existing type hierarchy. Since the added methods will need to be carried through the hierarchy, I think they should be done as traits that can be mixed in at each level. A bit more complex than I would like, but I don't see a simpler approach that works.

(Another option: pretend the Java bean style getters and setters don't exist at all, and use extension methods to add Scala style getters and setters. For consistency, that would mean covering all of the settable properties in each class. Yuck.)

scottdraves commented 7 years ago

Yea it was written to optimize the Groovy API but there might be a better way to accomplish the same result. We will look into it in another issue.

scottdraves commented 6 years ago

setColor should be fixed, please LMK if there are other similar problems (I expect them).

jpallas commented 6 years ago

Since you asked .... Here's what a quick grep turns up for setters taking Object:

setValue in CategoryGraphics takes an array of Objects, expecting it to be an array of either array of Number or List of Number, if I'm following the code correctly. Since arrays are not generic, those could be successfully overloaded without hitting type erasure problems.

Setters taking a List of Objects, however, which includes some of the color-related ones that result from adding overloads and setX in XYGraphics and ConstantBand, can't be turned into overloads because of type erasure.

I'm not sure what to make of setItemLabel in CategoryGraphics or setToolTip in XYGraphics. I'm happy to ignore them for now.

jpallas commented 6 years ago

Keeping this up to date on status. I've concluded that the only sensible way to make using the API not be horribly verbose is to emulate the ScalaFX syntax style. (The alternatives would be using named parameters with a huge number of parameters or a dynamic scheme that would depend on runtime checks.) Coupled with Scala-style accessors, this allows terse, reasonably idiomatic code like

val points = new Points {
  x = xValues
  y = yValues
  color = Color.RED
  shape = ShapeType.TRIANGLE
}

(As an aside, the "extend my class" pattern doesn't support this style, which is why I decided not to pursue it.)

For this to work, #6344 is needed so that the anonymous subclass created in this approach is serialized properly.

This still allows post-creation modification, such as

points.outlineColor = Color.BLUE
points.size = 20

It's possible to call the Java property-style accessors from Scala, but there's no way to overload a simple getter, so getFoo will return a Java type while foo will return a Scala type, if there's a difference. (The Scala accessors must be defined: Scala won't recognize property = value as a call to the mutator if there is only a mutator and no accessor.) If the Java type is nullable, the Scala accessor will return an Option and if the Java type is a List, the Scala accessor will return a Seq.

It's not clear whether companion-object construction interfaces are still useful with this style. On the one hand, it's slightly more convenient to say

Line(xValues, yValues)

than it is to say

new Line { x = xValues; y = yValues }

and the parameter names are visible in autocomplete (although completion doesn't seem to be working very well for Scala right now, not sure why, hopefully not just because of the move back to 2.11). On the other hand, you can't mix the two:

// Can't do this
Line(xValues, yValues) { color = Color.BLUE }
// Instead do this
new Line { x = xValues; y = yValues; color = Color.BLUE }
// or this
val line = Line(xValues, yValues)
line.color = Color.BLUE

So, I think the companion object construction interfaces should probably be limited to the most essential arguments, if they are retained at all.

Sorry to be so long-winded. Hopefully some of this can be recycled in the doc.

scottdraves commented 6 years ago

👍 merged 6344