kyonifer / koma

A scientific computing library for Kotlin. https://kyonifer.github.io/koma
Other
270 stars 23 forks source link

Easier creation of NDArrays with specified contents #72

Closed peastman closed 5 years ago

peastman commented 5 years ago

Matrix has a convenient builder syntax that makes it easy to create matrices containing arbitrary values, but there's nothing comparable for NDArray (or if there is, I haven't found it). This would be a useful addition. Possibly the same builder syntax could be extended to arrays, though it's not obvious to me how to handle arbitrary data types and more than two dimensions. Other approaches are also possible. For example, there could be a method that takes ordinary arrays as input. That's the method used by NumPy, although the syntax is less convenient in Kotlin.

val array = NDArray.create(arrayOf(doubleArrayOf(10.0, 5.0), doubleArrayOf(-5.0, 10.0)))

Another possibility is to pass the elements directly to the creator function.

val array = NDArray.create(1, 2, -3, -4)

That would create a 1D array. For more dimensions, you would list the elements in flattened order, then give the shape as an extra argument.

val array = NDArray.create(10.0, 5.0, -5.0, 10.0, shape=intArrayOf(2, 2))

This is similar to the approach used by ND4J.

drmoose commented 5 years ago

66 added matrixOf(). Perhaps an ndArrayOf()? Personally I like NDArary.create better but collectionNameOf() seems to be the kotlin way of doing things.

I also intended to add .reshape() overloads on built-in receiver types like List and Array, to allow going directly to matrix and nd from the built-in 1-d containers.

peastman commented 5 years ago

ndarrayOf() would work nicely. Note that reshape() creates a completely new array, so that would be less efficient than using a shape argument to create it with the right shape from the start.

peastman commented 5 years ago

Of course, assembling the varargs for the call to ndarrayOf() already involves creating a regular array, so I suppose there isn't really a difference in efficiency one way or the other.

drmoose commented 5 years ago

It's even worse than it looks. Behind the scenes, vararg is very fond of using Arrays.copy for no discernible reason. Whenever one of us does get around to implementing this, it we may want to benchmark to see whether codegen-ing overloads with fixed parameters (with a vararg in the mix as a fallback), saves any meaningful time. My intuition is "probably not, since Arrays.copy is O(n) and we'd only be avoiding it for very small n," but it seems like it's probably worth testing out.

peastman commented 5 years ago

Here's a very simple implementation.

inline fun <reified T> ndArrayOf(vararg elements: T, shape: IntArray? = null): NDArray<T> {
    if (shape == null)
        return NDArray.createLinear(elements.size, filler = { elements[it] })
    else
        return NDArray.createLinear(*shape, filler = { elements[it] })
}

If this looks ok, I can create a PR.

kyonifer commented 5 years ago

I like the idea of ndArrayOf existing. As you guys noted already, the simple approaches won't yield an optimized solution-- in addition to whats already been mentioned, the array that is allocated for the vararg elements will be boxed (e.g. java.lang.Double[]). However, this is no worse than mat[] and matrixOf which cast everything to an object, so I think we can deal with that as a separate issue from this one.

Your implementation looks reasonable overall. Couple of comments:

  1. The current mat[] and matrixOf implementations accept Any and always write out doubles. The idea behind this is that scientists often like to write mat[1, 2, 3.3] and would like that to "just work" by returning a Matrix<Double> even though in static languages like Kotlin the above is mixing integers and doubles and is therefore ambiguous. However, ndArrayOf as stated above behaves differently; for example, ndArrayOf(1,2,3) returns a NDArray<Int>. One way or another I think that ndArrayOf should be consistent with the behavior of matrixOf, as having them be inconsistent would probably cause confusion. There's a couple of ways we could go here: 1) Give up on auto-casting types for the user entirely, such that mat[1, 2, 3] returns a Matrix<Int> and make mat[1, 2, 3.3] an error (unless we can find some overloads that would default to double for mixed primitive types) 2) Change the behavior of matrixOf to match ndArrayOf but leave the mat[..] DSL to do the auto-casting to double as it does now. This way, matrixOf and ndArrayOf are consistent but mat[...] is a special case (which it already is by virtue of it being a DSL).

    Thoughts?

  2. A small nitpick: I'd suggest making the null handling a bit more kotlin idomatic, e.g.

    NDArray.createLinear(dims= *shape ?: intArrayOf(elements.size), filler = { elements[it] })
kyonifer commented 5 years ago

unless we can find some overloads that would default to double for mixed primitive types

To clarify, the issue with

fun matrixOf(vararg elements: Int): Matrix<Int> 
fun matrixOf(vararg elements: Double): Matrix<Double> 
fun matrixOf(vararg elements: Any): Matrix<Double> 

Is deciding how to encode end so that it can show up in all of them, and not overwrite a useful value like NaN.

peastman commented 5 years ago

That's a good question. I guess I would lean toward not casting and taking the values as whatever type they're given. So you would have to write 1.0, 2.0, 3.3 instead of 1, 2, 3.3. That seems more consistent with how Kotlin works, and it also means you can control the data type. Right now it isn't possible to create a matrix of Ints or Floats with this mechanism.

In practice, I suspect speed will usually be irrelevant for these functions. If you're explicitly putting the matrix elements into your source code, it's probably a pretty small matrix. 20 elements yes, a million elements no.

kyonifer commented 5 years ago

Closed by #73.