Closed taniabogatsch closed 2 months ago
I don't have much time ATM, but this weekend I will look at this and give a review.
@JAicewizard, that would be great.
Let me know how using the interface for the table UDFs goes. If it works well, then I think this is ready to merge.
I have one small nit, it would be really nice to have an error-less way to create supported types. That is the reason I implemented ways to create duckdb types using generics, since it allows statically checking that the type will result in a supported duckdb type.
One example is the following piece of code:
func (d *incTableUDF) GetFunction() RowTableFunction {
return RowTableFunction{
Config: TableFunctionConfig{
Arguments: []Type{NewDuckdbType[int64]()},
},
BindArguments: BindIncTableUDF,
}
}
which is now the following:
func (d *incTableUDF) GetFunction() (RowTableFunction, error) {
t, err := NewTypeInfo(TYPE_BIGINT)
if err != nil {
return RowTableFunction{}, err
}
return RowTableFunction{
Config: TableFunctionConfig{
Arguments: []Type{t},
},
BindArguments: BindIncTableUDF,
}, nil
}
and extra error handling in go-duckdb
which is significantly longer. Since table functions already contain quite a bit of boilerplate (on the users end) I would prefer to have some error-less way to create basic types. This could also remove the error return parameter on the Config
method for the scalar functions, but I think there is much less boilerplate so it is less of a problem.
Additionally, using generics to create a new type using generics/reflection allows easily creating deeply nested types since it can be infered from the go type
I have one small nit, it would be really nice to have an error-less way to create supported types. That is the reason I implemented ways to create duckdb types using generics, since it allows statically checking that the type will result in a supported duckdb type.
I understand the reasoning behind this. When I tried to move your code into this PR, though, I encountered a few issues with using generics.
For any timestamp-based duckdb type, we cannot infer the intended type from the Go type (to the best of my knowledge).
case time.Time:
// TYPE_TIMESTAMP, TYPE_TIMESTAMP_S, TYPE_TIMESTAMP_MS, TYPE_TIMESTAMP_NS, TYPE_TIMESTAMP_TZ,
// TYPE_DATE, TYPE_TIME?
Also, for UUID
and byte
, I don't know how to best determine whether it is a UUID
or a BLOB
.
Maybe a way to use generics is to add the timestamp types to go-duckdb. We already do this for Interval
, Decimal
, UUID
, etc. 🤔
type Timestamp time.Time
type TimestampMS time.Time
My preference would be errorless code as well. But after thinking more about it, I am unsure how to solve any ambiguity issues, both with the currently supported types and any potential future types.
For example, how would we infer an ENUM
type? Or, if we were to add support for ARRAY
, how would we distinguish between a slice with three (fixed-size array) elements, or a LIST
.
So, while the current approach introduces some error handling and more code for nested type creation, it is also much more flexible and robust... 🤔
An alternative to deciding between one of the two approaches might be to add a fast path for primitive types or an UnsafeNewXX
function that omits errors. But that would add more complexity to the API for users and us to maintain.
Here is a snippet for using generics for type detection. https://github.com/marcboeker/go-duckdb/pull/201#discussion_r1760934637
cc @marcboeker
Hmm ok I understand your issues here, I will leave a comment on the scalar PR in a minute
This PR adds methods to expose
Type
creation. This is necessary for the table UDF (#201) and scalar UDF (#222) PRs. There are also a few other changes here and there, but the gist is introducing the exportedType
.~cc @JAicewizard, @marcboeker, or others. I currently use the functions below. There is a switch for the primitive
reflect
types, some very basic functions for the non-Go types, and more complex functions for some types. However, there are a few things that I do not like about this approach.~ ~- Usage ofreflect
as input argument.~ ~- A lot of functions that could be a switch or a map lookup.~