vavr-io / vavr

vʌvr (formerly called Javaslang) is a non-commercial, non-profit object-functional library that runs with Java 8+. It aims to reduce the lines of code and increase code quality.
https://vavr.io
Other
5.67k stars 628 forks source link

Fix numerical operations of Traversables and make functions total #207

Closed danieldietrich closed 9 years ago

danieldietrich commented 9 years ago

Currently we have the following numerical operations:

Computational

Current approach: The result has the same component type as the elements

This is generally ok but has several disadvantages. By looking at the first element, we are able to get runtime information about the component type. The empty traversable has no elements. Because of type erasure there is no way to get the component type. We're stuck here, the methods just throw in this case.

Another disadvantage is the handling of overflows. For example Java's IntStream sums ints with long precision to get around overflows. Returning the same component type is inaccurate.

Suggested approach: The result is of type double.

Every numeric type can be represented as double. We can return 0.0 in the case the traversable is empty. And we get around throwing exceptions by returning NaN if no computation is possible. average() should return an Option because the average of no elements is not defined. This looks like a viable solution.

The precision of Java double values is 64-bit IEEE 754 floating point. More information about floating point numbers: http://introcs.cs.princeton.edu/java/91float

Comparision

Currently these methods return a value or throw if there is no element.

Instead these methods should return an Option, i.e. Some(element) or None, if a computation in not possible.

danieldietrich commented 9 years ago

Technical realization: Perform computation with JavaScript

Java 8 has Nashorn, why not use it? JavaScript is an untyped language, operations are always defined/do not throw and the JavaScript engine is fast. JavaScript has also 64 bit doubles, so there should be no loss regarding precision.

Example: Traversable.sum()

// this is Java
default double sum() {
    return Try.of(() -> (double) JS.get().invokeFunction("sum", this)).get();
}
// this is JavaScript
var sum = function(iterable) {
    var acc = 0.0
    for each (elem in iterable) {
        acc += parseFloat(elem)
    }
    return acc
}

The JavaScript engine JS is set up as follows:

// the engine is set up lazily within the Traversable interface
Lazy<Invocable> JS = Lazy.of(() -> Try.of(() -> {
    final ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
    engine.eval(new InputStreamReader(Traversable.class.getResourceAsStream("traversable.js")));
    return (Invocable) engine;
}).get());

Note: With this approach it is even possible to sum strings. E.g. the solution described above will do the following:

List.of("1 kg", "2 kg", "3 kg").sum(); // = 6

This solution should also be thread-safe. More on Nashorn JavaScript thread-safety here: https://blogs.oracle.com/nashorn/entry/nashorn_multi_threading_and_mt

danieldietrich commented 9 years ago

A more optimized script would look like this:

/*
 * Computes the average of iterable elements as double.
 */
var avg = function(iterable) {
    // division by zero is NaN
    return compute(iterable, 0.0, function(acc, value) { return acc + value }, function(acc, count) { return acc / count })
}

/*
 * Computes the sum of iterable elements as double.
 */
var sum = function(iterable) {
    return compute(iterable, 0.0, function(acc, value) { return acc + value }, function(acc, count) { return acc })
}

/*
 * Computes the product of iterable elements as double.
 */
var product = function(iterable) {
    return compute(iterable, 1.0, function(acc, value) { return acc * value }, function(acc, count) { return acc })
}

/*
 * Represents a computation given iterable elements.
 *
 * @param iterable A java.lang.Iterable of arbitrary objects.
 * @param zero The neutral element regarding the accumulate operation.
 * @param accumulate(acc, value) Accumulates the current accumulator with the given value.
 * @param result(acc, count) Computes the result given the accumulator and the element count.
 * @return a double value or NaN if an element has no numerical representation using parseFloat()
 */
var compute = function(iterable, zero, accumulate, result) {
    var acc = zero
    var count = 0
    for each (elem in iterable) {
        var float = parseFloat(elem)
        if (isNaN(float)) {
            return float
        }
        acc = accumulate(acc, float)
        count += 1
    }
    return result(acc, count)
}
danieldietrich commented 9 years ago

I'm convinced that this goes in the right direction. However, Java is a statically typed backend language without room for 'elastic' behavior. Most probably there will be 'fear' of undefined values, e.g. when summing Strings or arbitrary objects without a numeric representation.

Maybe the correct 'Java-way' would be to throw a NumberFormatException instead of returning NaN or Some(NaN). On the other hand NaN is expressing exactly that the result of a computation is not a number. We will see. I'm waiting for some feedback.

Throwing an Exception would be a viable option here. But hey, we are functional now! Java folks should start to think in values. Exceptions are unnecessary ballast.

Update: Before this, I thought of an approach using an interface to describe how an object is transformed to a number/double. Unfortunately the Java type Number is an abstract class, not an interface. Scala can bring implicit helpers into scope which take care of such things as conversion. Java can't do that. We need interfaces for that. But we should not underestimate the power of Traversable. Instead of having an interface to transform objects to numbers we just use Traversable.map to get them in the right shape.

danieldietrich commented 9 years ago

I've slept one night over this topic and I think it would be great to return an instance of Number instead of a double.

The caller of sum could decide, if a double or long is wanted. The underlying implementation could compute both - a long value and a double value. E.g. there would be no loss of accuracy when summing up int values.

Also I think, that we should throw a NumberFormatException or even an UnsupportedOperationException, if the elements of Traversable are not numeric / have no number representation.

The same applies for min, max, etc - if the elements are not comparable, an UnsupportedOperationException should be thrown.

It is unsatisfactory that the API for numerical changes will again change slightly. But the goal is to provide the best solution possible.

ggalmazor commented 9 years ago

:+1: