holgerbrandl / krangl

krangl is a {K}otlin DSL for data w{rangl}ing
MIT License
561 stars 50 forks source link

Concatenate multiple DataFrames #80

Closed bdudelsack closed 4 years ago

bdudelsack commented 4 years ago

Is there a way to concatenate multiple DataFrames? I haven't found any easy possibility to do this. I've also tried to create a new DataFrame with dataFrameOf and Lists of DataCols but had no success.

dataFrameOf(*labs[0].cols.toTypedArray(), *labs[1].cols.toTypedArray())

Here is my usecase. I have 5 CSV files with students from the different labs. I want to make a single DataFrame from it with additional column for a lab time (without modifying the CSV files).

val times = listOf("1030", "1215", "1400", "1545", "1730")
val labs = times.map { lab ->
    DataFrame.readCSV("Students-Lab-${lab}.csv", format = CSVFormat.DEFAULT.withHeader().withDelimiter(';'))
        .select("Firstname", "Lastname", "Username")
        .addColumn("Lab") {
            lab
        }
}

/* ... somehow concatenate labs-DataFrames to a single one ... */

Here is the way i would concatenate it in Pandas:

students = pd.concat([lab1, lab2, lab3, lab4, lab5], ignore_index=True)

Any help is appreciated. Thanks!

holgerbrandl commented 4 years ago

Sure, there are multiple bindRows implementations with varying signatures in Extensions.kt (i.e. in krangls default namespace).

Does this cover your use case or do you need something else?

bdudelsack commented 4 years ago

Thank you. That was exactly what i needed. I just didn't expect the function to be named like that.

Solution:

bindRows(*labs.toTypedArray())
holgerbrandl commented 4 years ago

Feel welcome if you spot any other missing bits in the API.