Open lonnc opened 4 years ago
I'm having the same problem, though now with v1.6.0
I don't even get the uint8 repeated version to work.
I created these test cases as some of the solutions I could think of looking at the docs:
package main
import (
"fmt"
"github.com/xitongsys/parquet-go-source/local"
"github.com/xitongsys/parquet-go/writer"
)
func write(obj interface{}, filename string) error {
fw, _ := local.NewLocalFileWriter(filename)
pw, err := writer.NewParquetWriter(fw, obj, 4)
if err != nil {
return fmt.Errorf("failed on new writer: %v", err)
}
for i := 0; i < 10; i++ {
err := pw.Write(obj)
if err != nil {
return fmt.Errorf("failed on write: %v", err)
}
}
if err := pw.WriteStop(); err != nil {
return fmt.Errorf("failed on write stop: %v", err)
}
return nil
}
func main() {
bytes := []byte{0xDE, 0xAD, 0xBE, 0xEF}
type ByteArray struct {
Bytes []byte `parquet:"name=bytes, type=BYTE_ARRAY"`
}
fmt.Printf("byte array: %v\n", write(&ByteArray{bytes}, "bytearray.parquet"))
type Uint8Repeated struct {
Bytes []byte `parquet:"name=bytes, type=INT32, convertedtype=UINT_8, repetitiontype=REPEATED"`
}
fmt.Printf("uint8 repeated: %v\n", write(&Uint8Repeated{bytes}, "uint8repeated.parquet"))
type Int32Repeated struct {
Bytes []byte `parquet:"name=bytes, type=INT32, repetitiontype=REPEATED"`
}
fmt.Printf("int32 repeated: %v\n", write(&Int32Repeated{bytes}, "int32repeated.parquet"))
type Uint8List struct {
Bytes []byte `parquet:"name=bytes, type=MAP, convertedtype=LIST, valuetype=INT32, valueconvertedtype=UINT_8"`
}
fmt.Printf("uint8 list: %v\n", write(&Uint8List{bytes}, "uint8list.parquet"))
type Int32List struct {
Bytes []byte `parquet:"name=bytes, type=MAP, convertedtype=LIST, valuetype=INT32`
}
fmt.Printf("int32 list: %v\n", write(&Int32List{bytes}, "int32list.parquet"))
}
However, all of these fails in different places:
$ go run main.go
byte array: failed on new writer: type : not a valid Type string
uint8 repeated: failed on write stop: reflect: call of reflect.Value.Int on uint8 Value
int32 repeated: failed on write stop: reflect: call of reflect.Value.Int on uint8 Value
uint8 list: failed on write stop: reflect: call of reflect.Value.Int on uint8 Value
int32 list: failed on write stop: runtime error: invalid memory address or nil pointer dereference
Any of these tags that sounds more reasonable than the others? Is there something else that I've missed? Pinging @xitongsys for comments as well.
I'm wanting to include some raw bytes (e.g. images) in a parquet file. Currently I'm using:
This works, but I'm assuming (parquet being new to me) that a more efficient route would be to use BYTE_ARRAY directly and not fall back to a LIST of UINT_8s. Perhaps:
Now this doesn't work, as expected, with WriteStop() returning the error:
Would it be possible to add this functionality, or is there an alternative approach I should use?
Aside, I've also tried using a string in the struct and casting from and to a []byte, but while it works, it really don't feel right.