Open tanyaofei opened 2 years ago
Can you send me the file
Can you send me the file 1.parquet.zip
Can you create the DataFrame from this package, export it to paraquet and then try and import it back?
Can you create the DataFrame from this package, export it to paraquet and then try and import it back?
I tried it at the first time, it seems like a error parquet file with content "PAR1"
func main() {
df := dataframe.NewDataFrame(dataframe.NewSeriesString("A", nil, []string{"1", "2", "3"}))
file, _ := os.Create("1.parquet")
_ = exports.ExportToParquet(context.Background(), file, df)
}
A Parquet file is not text based. Can you try importing the file back.
A Parquet file is not text based. Can you try importing the file back.
df := dataframe.NewDataFrame(dataframe.NewSeriesString("A", nil, []string{"1", "2", "3"}))
file, _ := os.Create("1.parquet")
_ = exports.ExportToParquet(context.Background(), file, df)
fr, _ := local.NewLocalFileReader("1.parquet")
df, err := imports.LoadFromParquet(context.Background(), fr)
if err != nil {
panic(err)
}
fmt.Println(df)
panic: seek 1.parquet: invalid argument
goroutine 1 [running]:
main.main()
.../main.go:21 +0x465
Exiting.
Error at imports/parquet.go
, line 40: pr, err := reader.NewParquetReader(src, nil, int64(runtime.NumCPU()))
A Parquet file is not text based. Can you try importing the file back.
My parquet-go
version is v1.6.2: github.com/xitongsys/parquet-go v1.6.2
I tried opening your file and it worked:
package main
import "github.com/xitongsys/parquet-go-source/local"
import "github.com/rocketlaunchr/dataframe-go/imports"
import "fmt"
import "context"
var ctx = context.Background()
func main() {
fr, _ := local.NewLocalFileReader("1.parquet")
defer fr.Close()
df, err := imports.LoadFromParquet(ctx, fr)
if err != nil {
panic(err)
}
fmt.Println(df)
}
OUTPUT:
+-----+--------+-------+---------+
| | A | B | C |
+-----+--------+-------+---------+
| 0: | a | 2 | 10 |
| 1: | b | 3 | 20 |
| 2: | c | 4 | NaN |
| 3: | d | 1 | NaN |
+-----+--------+-------+---------+
| 4X3 | STRING | INT64 | FLOAT64 |
+-----+--------+-------+---------+
I tried opening your file and it worked:
package main import "github.com/xitongsys/parquet-go-source/local" import "github.com/rocketlaunchr/dataframe-go/imports" import "fmt" import "context" var ctx = context.Background() func main() { fr, _ := local.NewLocalFileReader("1.parquet") defer fr.Close() df, err := imports.LoadFromParquet(ctx, fr) if err != nil { panic(err) } fmt.Println(df) }
OUTPUT:
+-----+--------+-------+---------+ | | A | B | C | +-----+--------+-------+---------+ | 0: | a | 2 | 10 | | 1: | b | 3 | 20 | | 2: | c | 4 | NaN | | 3: | d | 1 | NaN | +-----+--------+-------+---------+ | 4X3 | STRING | INT64 | FLOAT64 | +-----+--------+-------+---------+
Can you tell me your parquet-go
version ?
module main
go 1.18
require (
github.com/rocketlaunchr/dataframe-go v0.0.0-00010101000000-000000000000
github.com/xitongsys/parquet-go-source v0.0.0-20200509081216-8db33acb0acf
)
require (
github.com/apache/thrift v0.0.0-20181112125854-24918abba929 // indirect
github.com/goccy/go-json v0.7.6 // indirect
github.com/golang/snappy v0.0.0-20180518054509-2e65f85255db // indirect
github.com/google/go-cmp v0.4.0 // indirect
github.com/guptarohit/asciigraph v0.5.1 // indirect
github.com/juju/clock v0.0.0-20190205081909-9c5c9712527c // indirect
github.com/juju/errors v0.0.0-20200330140219-3fe23663418f // indirect
github.com/juju/loggo v0.0.0-20200526014432-9ce3a2e09b5e // indirect
github.com/juju/utils/v2 v2.0.0-20200923005554-4646bfea2ef1 // indirect
github.com/klauspost/compress v1.9.7 // indirect
github.com/mattn/go-runewidth v0.0.7 // indirect
github.com/olekukonko/tablewriter v0.0.4 // indirect
github.com/rocketlaunchr/mysql-go v1.1.3 // indirect
github.com/xitongsys/parquet-go v1.5.2 // indirect
golang.org/x/crypto v0.0.0-20200820211705-5c72a883971a // indirect
golang.org/x/exp v0.0.0-20200331195152-e8c3332aa8e5 // indirect
golang.org/x/net v0.0.0-20200904194848-62affa334b73 // indirect
golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a // indirect
gopkg.in/yaml.v2 v2.3.0 // indirect
)
I use github.com/apache/thrift v0.0.0-20181112125854-24918abba929
, github.com/xitongsys/parquet-go v1.5.2
and it works.
In the release notes:
[v1.6.0](https://github.com/xitongsys/parquet-go/releases/tag/v1.6.0)
Big changes in the type. Not compatiable with before.
I may need to update package to use 1.6+ instead of 1.5.
No idea why it is not using v1.5 for you since it's registered in the go.mod
file.
In the release notes:
[v1.6.0](https://github.com/xitongsys/parquet-go/releases/tag/v1.6.0) Big changes in the type. Not compatiable with before.
I may need to update package to use 1.6+ instead of 1.5.
No idea why it is not using v1.5 for you since it's registered in the
go.mod
file.
v1.5 works find, may be i installed parquet-go before installed dataframe-go, not sure about it.
It seems the problem solved, I should close this issue
Maybe you directly imported "github.com/rocketlaunchr/dataframe-go/imports"
without importing "github.com/rocketlaunchr/dataframe-go"
. Since there is no go.mod
file inside github.com/rocketlaunchr/dataframe-go/imports
directory, it just downloaded and used the latest version of parquet-go
Maybe you directly imported
"github.com/rocketlaunchr/dataframe-go/imports"
without importing"github.com/rocketlaunchr/dataframe-go"
. Since there is nogo.mod
file insidegithub.com/rocketlaunchr/dataframe-go/imports
directory, it just downloaded and used the latest version ofparquet-go
Here is my shell records
➜ go get -u github.com/rocketlaunchr/dataframe-go
go: downloading github.com/rocketlaunchr/dataframe-go v0.0.0-20211025052708-a1030444159b
go: downloading golang.org/x/exp v0.0.0-20200331195152-e8c3332aa8e5
go: downloading github.com/google/go-cmp v0.4.0
go: downloading github.com/guptarohit/asciigraph v0.5.1
go: downloading github.com/olekukonko/tablewriter v0.0.4
go: downloading golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a
go: downloading github.com/olekukonko/tablewriter v0.0.5
go: downloading github.com/google/go-cmp v0.5.7
go: downloading github.com/mattn/go-runewidth v0.0.7
go: downloading github.com/mattn/go-runewidth v0.0.13
go: downloading golang.org/x/exp v0.0.0-20220328175248-053ad81199eb
go: downloading github.com/guptarohit/asciigraph v0.5.3
go: downloading github.com/rivo/uniseg v0.2.0
go: added github.com/google/go-cmp v0.5.7
go: added github.com/guptarohit/asciigraph v0.5.3
go: added github.com/mattn/go-runewidth v0.0.13
go: added github.com/olekukonko/tablewriter v0.0.5
go: added github.com/rivo/uniseg v0.2.0
go: added github.com/rocketlaunchr/dataframe-go v0.0.0-20211025052708-a1030444159b
go: added golang.org/x/exp v0.0.0-20220328175248-053ad81199eb
go: added golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
➜ go get -u github.com/xitongsys/parquet-go/parquet
go: downloading github.com/apache/thrift v0.16.0
go: upgraded github.com/apache/thrift v0.0.0-20181112125854-24918abba929 => v0.16.0
go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
➜ go get -u github.com/xitongsys/parquet-go-source
go: downloading github.com/xitongsys/parquet-go-source v0.0.0-20220315005136-aec0fe3e777c
go: upgraded github.com/xitongsys/parquet-go-source v0.0.0-20200817004010-026bad9b25d0 => v0.0.0-20220315005136-aec0fe3e777c
You shouldn't have done the last 2 go gets
since they don't have a go.mod
file so it just assumed the latest version hence: go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
From Go's point of view, when you do that, it's an unrelated package.
You shouldn't have done the last 2
go gets
since they don't have ago.mod
file so it just assumed the latest version hence:go: upgraded github.com/xitongsys/parquet-go v1.5.2 => v1.6.2
Get it, thanks a lot
Hi - when is this lib going to be upgraded to use >= V1.6.2 of parquet-go please? having to fix on v1.5.4 just broke all the tagging I was using which assumed V1.6.2 :-(
There is a backward-incompatible change in v1.6.2. Therefore I will need to explore it more deeply.
This package's go.mod is set to github.com/xitongsys/parquet-go v1.5.2
so it should work for you provided you don't try and indepdently go get
the "github.com/rocketlaunchr/dataframe-go/imports"
package.
Let the main package dictate the dependencies for the sub-packages.
dataframe.to_parquet("1.parquet")
goroutine 1 [running]: github.com/rocketlaunchr/dataframe-go.NewDataFrame({0xc0001f8000, 0x3, 0xc000149a10?}) .../rocketlaunchr/dataframe-go@v0.0.0-20211025052708-a1030444159b/dataframe.go:41 +0x33c github.com/rocketlaunchr/dataframe-go/imports.LoadFromParquet({0x1497868, 0xc000020080}, {0x1498150?, 0xc00000e798?}, {0xc0000021a0?, 0xc000149f70?, 0x1007599?}) .../go/pkg/mod/github.com/rocketlaunchr/dataframe-go@v0.0.0-20211025052708-a1030444159b/imports/parquet.go:110 +0x8ae main.main() .../main.go:13 +0x78