holgerbrandl / krangl

krangl is a {K}otlin DSL for data w{rangl}ing
MIT License
560 stars 50 forks source link

Incorrect column type when Long values are present #138

Closed sorokod closed 2 years ago

sorokod commented 2 years ago

The issue can be observed with the following examples

dataFrameOf("data")(23, 11.2, Long.MAX_VALUE).apply { schema() }
// data  [Integer]  23, 11,2, 9223372036854775807

dataFrameOf("data")(11.2, 23,  Long.MAX_VALUE).apply { schema() }
// data  [Any:Double]  11,2, 23, 9223372036854775807

dataFrameOf("data")(Long.MAX_VALUE, 11.2, 23).apply { schema() }
// data  [Any:Long]  9223372036854775807, 11,2, 23

The expected output should be data [Dbl] ...

Problem is that krangl is trying to guess the column type, and the reason it is guessing is because isMixedNumeric ignores Longs.

How about changing isMixedNumeric to:

internal fun isMixedNumeric(mutation: List<*>): Boolean {
    for (item in mutation) {
        if (item !is Number?) return false
    }
    return true
}

If so, would you like a PR?

holgerbrandl commented 2 years ago

Well, spotted. Let's do so.