More features for NDArray

peastman commented 5 years ago

Matrix has a lot of features that would be equally appropriate for NDArray. This includes methods like min(), max(), mean(), toString(), etc. Also a lot of the math functions defined in matrixfuncs.kt. If you agree, I'd like to start implementing some of them.

kyonifer commented 5 years ago

Yes this makes sense. Those functions being limited to Matrix are historical as NDArray is a relatively new addition. Feel free to make PRs and we can sort through any issues that come up.

Note that many of the top-level functions such as those in matrixfuncs.kt are Double only currently. It may be worth considering if those should be code-generated for each primitive and/or generically. If so, then supporting NDArray might be a simple matter of a few more generation targets being added which output NDArray extensions functions.

kyonifer commented 5 years ago

Opened #75 to capture my above note as a separate ticket. It may be possible to close both this issue and that one at once.

peastman commented 5 years ago

I think I've come up with a strategy that will allow all the different versions to be defined without needing code generation. I first create a function

inline fun <reified T> NDArray<T>.mapIfNumeric(crossinline f: (Double) -> Double): NDArray<Double> {
    when (T::class) {
        Double::class -> return (this as NDArray<Double>).map { f(it) }
        Float::class -> return (this as NDArray<Float>).map { f(it.toDouble()) }
        Long::class -> return (this as NDArray<Long>).map { f(it.toDouble()) }
        Int::class -> return (this as NDArray<Int>).map { f(it.toDouble()) }
        Short::class -> return (this as NDArray<Short>).map { f(it.toDouble()) }
        Byte::class -> return (this as NDArray<Byte>).map { f(it.toDouble()) }
        else -> throw IllegalArgumentException("This NDArray does not contain numeric data.")
    }
}

Lots of functions can easily be defined by using that. For example,

inline fun <reified T> acos(arr: NDArray<T>): NDArray<Double> {
    return arr.mapIfNumeric { kotlin.math.acos(it) }
}

I decompiled it and verified it wasn't doing any boxing.

kyonifer commented 5 years ago

I like that idea. Looks good to me for functions that map through to an underlying function which expects a scalar Double (which is a lot of them).

peastman commented 5 years ago

The one issue with that implementation is that the return type is always Double. If the input array has type Float, I would expect it to return an array of Floats. I need to think about possibilities.

Some functions will just need specialized implementations, due to their quirks. floor(), ceil(), and round() are only meaningful for Float and Double. Then there's abs(), which in kotlin.math is defined for Int, Long, Float, and Double. But the operation is equally applicable to Byte and Short, though the asymmetry at the lower end of the range is more likely to be a problem for them. (abs(Int.MIN_VALUE) is negative!) Perhaps it should promote those types to Int?

peastman commented 5 years ago

You know, I think I'm trying to hard to be clever. It's simpler to just write two functions,

fun acos(arr: NDArray<Float>): NDArray<Float> {
    return arr.map { kotlin.math.acos(it) }
}

fun acos(arr: NDArray<Double>): NDArray<Double> {
    return arr.map { kotlin.math.acos(it) }
}

The amount of code isn't that much more, it's easier to understand, and it avoids complications about return types.

peastman commented 5 years ago

I have a question about code generation. Some of the extension functions I'll be adding only apply to certain types. For example, mean() and sum() are only meaningful for numeric types. They shouldn't be defined for generic arrays. Given how codegen is implemented, it isn't obvious how to handle that. What do you suggest?

kyonifer commented 5 years ago

I'm traveling today but I'll look into this soon. Given the apparent recent changes to how inlining works, I'd like to explore the available options and see if we can find a way to clean up the codegen as you were attempting to do here.

kyonifer commented 5 years ago

I played with this a bit and regrettably couldn't come up with something I hated less than codegen, so we might need to stick with that for now.

Some of the extension functions I'll be adding only apply to certain types. For example, mean() and sum() are only meaningful for numeric types. They shouldn't be defined for generic arrays. Given how codegen is implemented, it isn't obvious how to handle that. What do you suggest?

Take a look at the convertGetter and floatersOnly. The former inserts different strings depending on whether or not dtype is the generic type or one of the primitive types. The latter adds allClose to only the floating point types, by checking dtype and returning an empty string unless dtype is Float or Double. Something similar could be done for mean and sum on NDArray.

Alternatively, you could also just add another copy { } block to genCode and generate brand new files by iterating over just the dtypes you need. There's no need for everything to go into the same set of files.

Also, feel free to just PR the extension functions directly if you'd rather not deal with it, and I can update the codegen afterwards.

peastman commented 5 years ago

One option to consider is switching to a more powerful templating engine. The one built into Gradle is pretty limited. A lot of other ones can handle conditional blocks, loops, etc., which would make cases like this easy. It also would let you put all the logic into the template, rather than splitting it between the template and the build script.

kyonifer commented 5 years ago

Fair enough. The only reason for using gradle's is to avoid another dependency. However, we could probably use something on the jvm like pebble and grab it directly from maven in the buildscript block or in a buildSrc project.

kyonifer commented 5 years ago

Opened #76 to track that idea.

kyonifer commented 5 years ago

Closed by #78.

kyonifer / koma

More features for NDArray #74