segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 58 forks source link

Add ability to provide schema name #310

Closed yonesko closed 2 years ago

yonesko commented 2 years ago

Hello! I am creating schema from reflect.StructOf(structFields) which doesn't have name (name cannot be set😞), and my parquet file doen't have file schema name too

Because of that, my college's parquet reading Java lib failing on reading.

Could you please add method SchemaOf with ability to provide name ?

slim-bean commented 2 years ago

I just opened #311 related to doing something similar, creating the schema from reflect.StructOf(...) it sounds like you were actually able to create files this way? I also noticed the lack of name on a dynamically created struct and figured this would be an issue.

Would you be able to share code of how you were able to generate a file with reflect.StructOf?

yonesko commented 2 years ago

something like

    var structFields []reflect.StructField
    for i, f := range fields {
        fType, err := convertToType(f, false)
        if err != nil {
            return nil, fmt.Errorf("error for field '%s': %w", names[i], err)
        }
        //goland:noinspection GoDeprecation
        var listTag string
        if fType.Kind() == reflect.Slice || fType.Kind() == reflect.Array {
            listTag = ",list"
        }
        var optionalTag string
        if ok, _ := f.GetNullable(); ok {
            optionalTag = ",optional"
        }
        //goland:noinspection GoDeprecation
        structFields = append(structFields, reflect.StructField{
            Name: strings.Title(names[i]),
            Type: fType,
            Tag:  reflect.StructTag(fmt.Sprintf(`parquet:"%s%s%s" json:"%s" csv:"%s"`, names[i], optionalTag, listTag, names[i], names[i])),
        })
    }

    return reflect.StructOf(structFields), nil
slim-bean commented 2 years ago

thanks! This is really helpful!

I also found your other closed PR #245 which gave me a clue how to get the dynamic struct to work in schema generation, so thank you adding your follow up comment on that PR as well!

I'm doing this now which seems to be working:

typ := reflect.StructOf(fields)
schema := parquet.SchemaOf(reflect.New(typ).Interface())

It seems the reflect.TypeOf(model) in the SchemaOf function doesn't work will unless you pass the Interface type

Sorry for hijacking this issue, I think I'm in the same place you are now where it would be necessary to provide a schema name when creating a schema.

achille-roussel commented 2 years ago

Have you tried something like this?

schema := parquet.NewSchema("schema_name", parquet.SchemaOf(model))
yonesko commented 2 years ago

Thanks! Works fine!