rocketlaunchr / dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Other
1.19k stars 95 forks source link

CSV import does not support dictated type for fields with potential empty values #44

Closed migscabral closed 4 years ago

migscabral commented 4 years ago

When importing CSV and a field has empty values at some rows, import ignores the dictated data type and forces to interpret the field as string.

csvStr := `sometimes_empty,label
,"First"
2,"Second"
,"Third"
4,"Fourth"`

ctx := context.Background() 

df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions {
    DictateDataType: map[string]interface{} {
        "sometimes_empty": int64(0),
        "label": "",
    },
})

fmt.Println(err)
fmt.Println(df)

This code produces the error:

can't force string:  to int64. row: 0 field: sometimes_empty

How should I be able to create a dataframe from a CSV with empty values, that still follows the dictated type and have NaN instead on the empty vlaues?

Expected output should be:

+-----+-----------------+--------+
|     | SOMETIMES EMPTY | LABEL  |
+-----+-----------------+--------+
| 0:  |       NaN       | First  |
| 1:  |        2        | Second |
| 2:  |       NaN       | Third  |
| 3:  |        4        | Fourth |
+-----+-----------------+--------+
| 4X2 |      INT64      | STRING |
+-----+-----------------+--------+
pjebs commented 4 years ago

See NilValue : https://godoc.org/github.com/rocketlaunchr/dataframe-go/imports#CSVLoadOptions

pjebs commented 4 years ago

NilValue: &[]string{""}[0]