marcboeker / go-duckdb

go-duckdb provides a database/sql driver for the DuckDB database engine.
MIT License
745 stars 110 forks source link

[Feature] Type interface #272

Closed taniabogatsch closed 2 months ago

taniabogatsch commented 2 months ago

This PR adds methods to expose Type creation. This is necessary for the table UDF (#201) and scalar UDF (#222) PRs. There are also a few other changes here and there, but the gist is introducing the exported Type.

~cc @JAicewizard, @marcboeker, or others. I currently use the functions below. There is a switch for the primitive reflect types, some very basic functions for the non-Go types, and more complex functions for some types. However, there are a few things that I do not like about this approach.~ ~- Usage of reflect as input argument.~ ~- A lot of functions that could be a switch or a map lookup.~

JAicewizard commented 2 months ago

I don't have much time ATM, but this weekend I will look at this and give a review.

taniabogatsch commented 2 months ago

@JAicewizard, that would be great.

taniabogatsch commented 2 months ago

Let me know how using the interface for the table UDFs goes. If it works well, then I think this is ready to merge.

JAicewizard commented 2 months ago

I have one small nit, it would be really nice to have an error-less way to create supported types. That is the reason I implemented ways to create duckdb types using generics, since it allows statically checking that the type will result in a supported duckdb type.

One example is the following piece of code:

func (d *incTableUDF) GetFunction() RowTableFunction {
    return RowTableFunction{
        Config: TableFunctionConfig{
            Arguments: []Type{NewDuckdbType[int64]()},
        },
        BindArguments: BindIncTableUDF,
    }
}

which is now the following:

func (d *incTableUDF) GetFunction() (RowTableFunction, error) {
    t, err := NewTypeInfo(TYPE_BIGINT)
    if err != nil {
        return RowTableFunction{}, err
    }
    return RowTableFunction{
        Config: TableFunctionConfig{
            Arguments: []Type{t},
        },
        BindArguments: BindIncTableUDF,
    }, nil
}

and extra error handling in go-duckdb

which is significantly longer. Since table functions already contain quite a bit of boilerplate (on the users end) I would prefer to have some error-less way to create basic types. This could also remove the error return parameter on the Config method for the scalar functions, but I think there is much less boilerplate so it is less of a problem.

JAicewizard commented 2 months ago

Additionally, using generics to create a new type using generics/reflection allows easily creating deeply nested types since it can be infered from the go type

taniabogatsch commented 2 months ago

I have one small nit, it would be really nice to have an error-less way to create supported types. That is the reason I implemented ways to create duckdb types using generics, since it allows statically checking that the type will result in a supported duckdb type.

I understand the reasoning behind this. When I tried to move your code into this PR, though, I encountered a few issues with using generics.

For any timestamp-based duckdb type, we cannot infer the intended type from the Go type (to the best of my knowledge).

case time.Time:
    // TYPE_TIMESTAMP, TYPE_TIMESTAMP_S, TYPE_TIMESTAMP_MS, TYPE_TIMESTAMP_NS, TYPE_TIMESTAMP_TZ,
     // TYPE_DATE, TYPE_TIME?

Also, for UUID and byte, I don't know how to best determine whether it is a UUID or a BLOB.

taniabogatsch commented 2 months ago

Maybe a way to use generics is to add the timestamp types to go-duckdb. We already do this for Interval, Decimal, UUID, etc. 🤔

type Timestamp time.Time
type TimestampMS time.Time
taniabogatsch commented 2 months ago

My preference would be errorless code as well. But after thinking more about it, I am unsure how to solve any ambiguity issues, both with the currently supported types and any potential future types.

For example, how would we infer an ENUM type? Or, if we were to add support for ARRAY, how would we distinguish between a slice with three (fixed-size array) elements, or a LIST.

So, while the current approach introduces some error handling and more code for nested type creation, it is also much more flexible and robust... 🤔

An alternative to deciding between one of the two approaches might be to add a fast path for primitive types or an UnsafeNewXX function that omits errors. But that would add more complexity to the API for users and us to maintain.

taniabogatsch commented 2 months ago

Here is a snippet for using generics for type detection. https://github.com/marcboeker/go-duckdb/pull/201#discussion_r1760934637

cc @marcboeker

JAicewizard commented 2 months ago

Hmm ok I understand your issues here, I will leave a comment on the scalar PR in a minute