rocketlaunchr / dataframe-go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Other
1.16k stars 93 forks source link

Getting dataframe.ApplySeriesFn undefined error #23

Closed MrPowers closed 4 years ago

MrPowers commented 4 years ago

Thanks for creating this library!

I can get this code to work:

ctx := context.TODO()

// step 1: open the csv
csvfile, err := os.Open("data/example.csv")
if err != nil {
    log.Fatal(err)
}

dataframe, err := imports.LoadFromCSV(ctx, csvfile)

Here's the data that's printed:

fmt.Print(dataframe.Table())

+-----+------------+-----------------+
|     | FIRST NAME | FAVORITE NUMBER |
+-----+------------+-----------------+
| 0:  |  matthew   |       23        |
| 1:  |   daniel   |        8        |
| 2:  |  allison   |       42        |
| 3:  |   david    |       18        |
+-----+------------+-----------------+
| 4X2 |   STRING   |     STRING      |
+-----+------------+-----------------+

I cannot get this code working:

s := dataframe.Series[2]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
    return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})

fmt.Print(dataframe.Table())

Here's the error message:

./dataframe_go.go:36:22: dataframe.ApplySeriesFn undefined (type *dataframe.DataFrame has no field or method ApplySeriesFn)
./dataframe_go.go:40:11: dataframe.Apply undefined (type *dataframe.DataFrame has no field or method Apply)
./dataframe_go.go:40:44: dataframe.FilterOptions undefined (type *dataframe.DataFrame has no field or method FilterOptions)

Here's the code: https://github.com/MrPowers/go-dataframe-examples/blob/master/dataframe_go.go

Sorry if this is a basic question. I am a Go newbie!

Thanks again for making this library!

propersam commented 4 years ago

Hi @MrPowers, I can see the issue is frm ur code usage...

First of all, I would really recommend u change ur variable name frm dataframe to somtn else maybe df

Secondly ur dataframe has only 2 series(columns). Indexing starts frm 0( for first )nd 1( for second)

So it shud be s := df.Series[1]

propersam commented 4 years ago

So try this code instead:

s := df.Series[1]

applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} {
    return 2 * val.(int64)
})

dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true})

fmt.Print(df.Table())

dataframe is supposed to be the name of the imported package.

import (
dataframe "github.com/rocketlaunchr/dataframe-go"
"github.com/rocketlaunchr/dataframe-go/imports"
)
MrPowers commented 4 years ago

@propersam - thanks for the quick response. Really appreciate the help!

I will definitely make sure to get this in the README, so it's easier to follow:

import (
  dataframe "github.com/rocketlaunchr/dataframe-go"
  "github.com/rocketlaunchr/dataframe-go/imports"
)

I have another bug now, but it's separate, so I'll raise another issue. I'll send the README PR right now before I forget.

propersam commented 4 years ago

we are very much grateful for your interest in contributing to this project. i dropped a comment in ur PR.

we hope to get more highly interesting PRs from you in later times.

propersam commented 4 years ago

Since this issue is solved..i guess it's safe to close this now...

pjebs commented 4 years ago

When you import, you need to use dictatedatatype to specify the that favorited number is an int64 and not a string. Then in Apply, you can multiply by 2.

MrPowers commented 4 years ago

@pjebs - thanks for the hint. I see DictateDataType in the code here.

Can you please provide me with a code snippet?

Let me know if you're open to updating this code snippet with DictateDataType so we can show users how to get the column types correct.

Might be cool to add some schema inference :) That's what's used a lot in the Spark / Scala world.

propersam commented 4 years ago

code snippet:

df, err := imports.LoadFromCSV(ctx, csvfile, imports.CSVLoadOptions{
    DictateDataType: map[string]interface{}{
    "firstName":     "", // specify this column as string
    "favoriteNumber":  int64(0), // specify this column as int64
    },
)

Note: firstName nd favoriteNumber are just sample names I use...the actual name you will putt in should be the exact column names in the csv files... it's Case sensitive

pjebs commented 4 years ago

I guess docs can be made clearer to specify that the key of DictateDataType is the name of column (case-sensitive)

MrPowers commented 4 years ago

Thank you both so much for the help!

@pjebs - I was thinking you might want to update this README example so folks can see how to use DictateDataType. Either way is good with me!

pjebs commented 4 years ago

@MrPowers How is the package serving you? Any other feedback? What is your use case?

MrPowers commented 4 years ago

@pjebs - thanks for reaching out.

I am a Spark developer and love DataFrames. I am exploring DataFrame options in different languages for fun!

I just published a blog post on DataFrames in Go. Let me know what you think! I am happy to edit it based on your feedback!

It'd be awesome to add dataframe-go to qbench so we can see how the dataframe-go performance compares with qframe and gota.

You have a great team of open source developers! Looking forward to collaborating with you and making a Go DataFrame library that supports the Apache Arrow memory format and Parquet files (especially column pruning for those juicy performance gains). Thanks again for all the help!!

pjebs commented 4 years ago

@MrPowers I had a quick read of your blog post and made some adjustments.

  1. In the add-schema-inference branch there is schema inference. It just needs to be tested.
  2. context requirement is idiomatic go. It is very bad if context is not the first param in case you need to cancel the operation because it's taking too long.
  3. You can use the NameToColumn function (or in the new PR, the MustNameToColumn function) to convert from name to column index.
  4. Filtering for your example would be done like this:
filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) {

    if vals["is_even"] != nil &&  vals["is_even"].(int64) == 1{
        return dataframe.KEEP, nil
    }
    return dataframe.DROP, nil
})
  1. The aims of the various packages are also different.

The primary aim of dataframe-go is to be flexible (and give the developer the power to do what ever they want). It also has a lot more features and probably more planned features (full time-series forecasting is coming, which you can see in one of the branches)