Closed MrPowers closed 4 years ago
Hi @MrPowers, I can see the issue is frm ur code usage...
First of all, I would really recommend u change ur variable name frm dataframe to somtn else maybe df
Secondly ur dataframe has only 2 series(columns). Indexing starts frm 0( for first )nd 1( for second)
So it shud be
s := df.Series[1]
So try this code instead:
s := df.Series[1]
applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
return 2 * val.(int64)
})
dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})
fmt.Print(df.Table())
dataframe is supposed to be the name of the imported package.
import (
dataframe "github.com/rocketlaunchr/dataframe-go"
"github.com/rocketlaunchr/dataframe-go/imports"
)
@propersam - thanks for the quick response. Really appreciate the help!
I will definitely make sure to get this in the README, so it's easier to follow:
import (
dataframe "github.com/rocketlaunchr/dataframe-go"
"github.com/rocketlaunchr/dataframe-go/imports"
)
I have another bug now, but it's separate, so I'll raise another issue. I'll send the README PR right now before I forget.
we are very much grateful for your interest in contributing to this project. i dropped a comment in ur PR.
we hope to get more highly interesting PRs from you in later times.
Since this issue is solved..i guess it's safe to close this now...
When you import, you need to use dictatedatatype to specify the that favorited number is an int64 and not a string. Then in Apply, you can multiply by 2.
@pjebs - thanks for the hint. I see DictateDataType
in the code here.
Can you please provide me with a code snippet?
Let me know if you're open to updating this code snippet with DictateDataType
so we can show users how to get the column types correct.
Might be cool to add some schema inference :) That's what's used a lot in the Spark / Scala world.
code snippet:
df, err := imports.LoadFromCSV(ctx, csvfile, imports.CSVLoadOptions{
DictateDataType: map[string]interface{}{
"firstName": "", // specify this column as string
"favoriteNumber": int64(0), // specify this column as int64
},
)
Note: firstName
nd favoriteNumber
are just sample names I use...the actual name you will putt in should be the exact column names in the csv files... it's Case sensitive
I guess docs can be made clearer to specify that the key of DictateDataType is the name of column (case-sensitive)
Thank you both so much for the help!
@pjebs - I was thinking you might want to update this README example so folks can see how to use DictateDataType
. Either way is good with me!
@MrPowers How is the package serving you? Any other feedback? What is your use case?
@pjebs - thanks for reaching out.
I am a Spark developer and love DataFrames. I am exploring DataFrame options in different languages for fun!
I just published a blog post on DataFrames in Go. Let me know what you think! I am happy to edit it based on your feedback!
It'd be awesome to add dataframe-go to qbench so we can see how the dataframe-go performance compares with qframe and gota.
You have a great team of open source developers! Looking forward to collaborating with you and making a Go DataFrame library that supports the Apache Arrow memory format and Parquet files (especially column pruning for those juicy performance gains). Thanks again for all the help!!
@MrPowers I had a quick read of your blog post and made some adjustments.
add-schema-inference
branch there is schema inference. It just needs to be tested.NameToColumn
function (or in the new PR, the MustNameToColumn
function) to convert from name to column index.filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {
if vals["is_even"] != nil && vals["is_even"].(int64) == 1{
return dataframe.KEEP, nil
}
return dataframe.DROP, nil
})
The primary aim of dataframe-go is to be flexible (and give the developer the power to do what ever they want). It also has a lot more features and probably more planned features (full time-series forecasting is coming, which you can see in one of the branches)
Thanks for creating this library!
I can get this code to work:
Here's the data that's printed:
I cannot get this code working:
Here's the error message:
Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go
Sorry if this is a basic question. I am a Go newbie!
Thanks again for making this library!