Add reshape and conversion methods between matrix/list/array/ndarray

thomasnield commented 6 years ago

I'll at least start this and put in a PR.

thomasnield commented 6 years ago

Here are some proposed extensions we can start with:

fun <T> Iterable<T>.toMatrix(vararg valueSelectors: (T) -> Double): Matrix<Double> {
    val list = toList()

    val out = zeros(rows=list.size, cols= valueSelectors.size)

    for ((m,t) in list.withIndex()) {
        for ((n,v) in valueSelectors.withIndex()) {
            out[m,n] = v(t)
        }
    }
    return out
}

fun <T> Sequence<T>.toMatrix(vararg valueSelectors: (T) -> Double) = asIterable().toMatrix(*valueSelectors)

Usage Example:

fun main(args: Array<String>) {

    data class Sale(val accountId: Int, val saleDate: LocalDate, val billingAmount: Double)

    val sales = listOf(
            Sale(1, LocalDate.of(2016,12,3), 180.0),
            Sale(2, LocalDate.of(2016, 7, 4), 140.2),
            Sale(3, LocalDate.of(2016, 6, 3), 111.4),
            Sale(4, LocalDate.of(2016, 1, 5), 192.7),
            Sale(5, LocalDate.of(2016, 5, 4), 137.9),
            Sale(6, LocalDate.of(2016, 3, 6), 125.6),
            Sale(7, LocalDate.of(2016, 12,4), 164.3),
            Sale(8, LocalDate.of(2016, 7,11), 144.2)
    )

    val saleMatrix = sales.toMatrix(
            {it.saleDate.year.toDouble()},
            {it.saleDate.monthValue.toDouble()},
            {it.saleDate.dayOfMonth.toDouble()},
            {it.billingAmount }
    )
    println(saleMatrix)
}

thomasnield commented 6 years ago

I think you mentioned we should support different matrix types beyond Double since those can be supported in the backend? We might name them toDoubleMatrix() as well as toMatrix() for any generic T types...

kyonifer commented 6 years ago

Thanks for the contribution! These look like a good start.

We probably also want similar methods to yours on the NDArray, and a Iterable<Double>.toMatrix() that dumps all the elements in the iterable into the matrix. Not sure if one would expect this to dump a 1D column vector of all the elements, or if we should add a explicit shape parameter to the toMatrix call that would wrap them into 2D?

I'll start a laundry list of some more methods that'd be useful to have:


// Maybe these for sequences too?
Iterable<T>.toNDArray(/*some shape info */): NDArray<T>
Iterable<$primitive>.toMatrix(/* some shape info */): Matrix<$primitive>

// Makes 2D NDArrays from a matrix, should always succeed
Matrix<T>.toNDArray(): NDArray<T>
// Should always succeed
Matrix<$primitive>.toNumericalNDArray(): NumericalNDArray<$primitive>

// Throws errors if the NDArray isnt 2D
NDArray<$primitive>.toMatrix(): Matrix<$primitive>
// Should always succeed, since $primitive guaranteed it can be treated as a NumericalNDArray
NDArray<$primitive>.toNumerical(): NumericalNDArray<$primitive>

where the ones with $primitive are extensions that are only defined for certain types (not T in general). The only way I can think of to implement this at the moment is with codegen for all primitive types which is unfortunate. Things get simpler with the NDArray since its boxed and we aren't worrying about performance of primitives.

For Matrix we could always do something like

inline fun <reified T> findCorrectDType(): MatrixType<T> {
    when(T::class) {
        Double::class -> {return MatrixTypes.DoubleType as MatrixType<T>}
        Int::class -> {return MatrixTypes.IntType as MatrixType<T>}
        Float::class -> {return MatrixTypes.FloatType as MatrixType<T>}
    }
}

But now its a runtime failure if someone makes a Matrix implementation that isn't one of the supported primitives.

kyonifer / koma

Add reshape and conversion methods between matrix/list/array/ndarray #30