UINT Functions unexpectedly return doubles (or fail completely)

kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.

https://kuzudb.com/

MIT License

1.38k stars 97 forks source link

UINT Functions unexpectedly return doubles (or fail completely) #2117

Closed Riolku closed 1 year ago

Riolku commented 1 year ago

Reproducing:

create node table test(A SERIAL, B UINT64, PRIMARY KEY(A));
create (t:test {B: 4});
match (t:test) return t.B * 3;

The datatype of the result is a double, 12.0000. Furthermore:

match (t:test) return t.B & 3

Gives

Error: Binder exception: Cannot match a built-in function for given function BITWISE_AND(UINT64,INT64). Supported inputs are
(INT64,INT64) -> INT64

Riolku commented 1 year ago

Notably, this example runs fine if B is INT64.

acquamarin commented 1 year ago

We do implicit casting on arithmetic operations. E.g. UINT64 + INT64 => DOUBLE

Riolku commented 1 year ago

This seems rather unexpected. Shouldnt INT64 + UINT64 yield INT64? I understand that we might overflow, but we can special case that... we will absolutely lose precision if we cast to a double, which seems very undesirable.

Also, what about the missing bitwise operations?

andyfengHKU commented 1 year ago

I'll just add my opinion about the overflow part. To begin with, I'm not against the current approach in general.

I understand that we might overflow, but we can special case that

This is not doable because all data types are resolved at compile time. Meaning, you can not change a vector data type from INT to DOUBLE because an overflow happens at runtime.

From user perspective

if we use INT64 + UINT64 -> UINT64approach, then if overflow happens, user need to write to_double(INT64) + to_double(UINT64) -> DOUBLE
if we use INT64 + UINT64 -> DOUBLE approach, then to preserve accuracy user need to write to_uint64(INT64) + UINT64 -> UINT64 I don't see a big difference between the two approaches.

To conclude, we should just check PG and use PG standard.

Riolku commented 1 year ago

Makes sense, I agree with this approach.

Riolku commented 1 year ago

I think when we multiply by constants, we shouldnt get a double. In my example, t.B * 3 should return a UINT64.