It turned out that Rheem's Scala API is much easier to handle, less error-prone, more efficient, easier to read (a.s.o) and at the same time as powerful as the graph-based base API. With Scala not being popular with everybody, we should offer the same capabilities in Java. At a first glance, one might argue that one could simply call the Scala API from Java applications as well. However, this is not true because of (i) Scala-specific datatypes, (ii) optional parameters, and (iii) implicit parameters, including ClassTags. In consequence, a dedicated Java API is needed.
The following ideas might help to accomplish that goal:
The Java API could be a wrapper around the Scala API.
The lack of optional parameters could be compensated by subsequent method calls, e.g., instead of lines.map(_.toLowerCase, udfCpuLoad = ...) in the Scala API, we might want to have lines.map(String::toLowerCase).withUdfCpuLoad(...). Since Operators are mostly immutable, this entails a lazy creation of those, i.e., only after we know that no such chained calls will appear any more.
Not all situations require us to know the exact datatype being handled by the execution platforms. This important due to Java's type erasure. Therefore, we could do a best effort to infer the datatype, e.g. from UDFs or from Operator connections, and allow them to be added optionally, e.g., with .withInputType(...). If we do not have a datatype, then we use a surrogate one, e.g. DataSetType.unknown().
From @sekruse on August 11, 2016 9:6
It turned out that Rheem's Scala API is much easier to handle, less error-prone, more efficient, easier to read (a.s.o) and at the same time as powerful as the graph-based base API. With Scala not being popular with everybody, we should offer the same capabilities in Java. At a first glance, one might argue that one could simply call the Scala API from Java applications as well. However, this is not true because of (i) Scala-specific datatypes, (ii) optional parameters, and (iii) implicit parameters, including
ClassTag
s. In consequence, a dedicated Java API is needed.The following ideas might help to accomplish that goal:
lines.map(_.toLowerCase, udfCpuLoad = ...)
in the Scala API, we might want to havelines.map(String::toLowerCase).withUdfCpuLoad(...)
. SinceOperator
s are mostly immutable, this entails a lazy creation of those, i.e., only after we know that no such chained calls will appear any more.Operator
connections, and allow them to be added optionally, e.g., with.withInputType(...)
. If we do not have a datatype, then we use a surrogate one, e.g.DataSetType.unknown()
.Copied from original issue: daqcri/rheem#14