Closed migscabral closed 4 years ago
var ctx = context.Background()
func main() {
csvStr := `contact_number_country_code,contact_number
"973","12345678"`
df, _ := imports.LoadFromCSV(ctx, strings.NewReader(csvStr), imports.CSVLoadOptions{
DictateDataType: map[string]interface{}{
"contact_number_country_code": "",
"contact_number": "",
},
})
sConcatContactNumber := dataframe.NewSeriesString("concat_contact_number", &dataframe.SeriesInit{Size: df.NRows()})
df.AddSeries(sConcatContactNumber, nil)
applyFn := dataframe.ApplyDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) map[interface{}]interface{} {
return map[interface{}]interface{}{
"concat_contact_number": vals["contact_number_country_code"].(string) + vals["contact_number"].(string),
}
})
_, err := dataframe.Apply(ctx, df, applyFn, dataframe.FilterOptions{InPlace: true})
if err != nil {
log.WithError(err).Error("concatenation cannot be applied")
}
fmt.Println(df)
}
@migscabral
Can you try the sample I wrote above.
The issue is when you return the vals
. The return map indicates what you want to change.
The key of map accepts ints (for index of Series) or strings (for name of series).
In your erroneous case:
vals: map[0:973 1:12345678 2:<nil> concat_contact_number:97312345678 contact_number:12345678 contact_number_country_code:973]
It is saying change second column to nil but also saying change concat_contact_number Series to "97312345678". They point to the same Series in your case.
Either use ints or strings when referring to Series but not both.
@pjebs I updated my code to return a new map from inside ApplyDataFrameFn
and it now works. Thank you.
I have several questions so I may understand better how this library works:
I'm not sure where in my code where the int or string keys were set. From what I understand I did not explicitly set to use the int or string keys. The vals
map that was received by ApplyDataFrameFn
already contained both. Can you point me where it was set?
The key difference that I saw between your implementation and mine is that you created a new map inside the ApplyDataFrameFn
instead of directly modifying the vals
map. Is this the recommended way?
the vals
param contains the existing values for the row with the key as an int (index) and string (name) for convenience.
The applyFn must return a map that contains only what you want to update. You were basically rereturning the current row values (and not just the changes to update). I have updated the documentation to make it clearer.
Thank you @pjebs much clearer now. Do you have a link to the said documentation?
let me refresh godocs.org
I'm trying to concatenate two columns in a dataframe and put it into a new column. The behavior is very inconsistent. Sometimes the strings are concatenated into the new column. Sometimes the value is just set to
NaN
.In this run, the value for
concat_contact_number
in the resulting dataframe was correctly set to97312345678
. The map value forconcat_contact_number
also reflects the concatenated value.Expected output:
In this run, the value for
concat_contact_number
in the resulting dataframe was incorrectly set toNaN
. Same as with the correct run, the map value forconcat_contact_number
is also set to the expected concatenated value.Erroneous output:
It can be observed that in both cases the map value for
2
is always<nil>
. Is this expected?Run this code several times to see deviances in the output. The issue may not show up immediately. Sometimes it takes 10x runs, sometimes only 2x run. Again the behavior is inconsistent.
Working code: