r2dbc / r2dbc-spi

Service Provider Interface for R2DBC Implementations
Apache License 2.0
421 stars 57 forks source link

Feedback on R2DBC Type Mapping #97

Closed mp911de closed 5 years ago

mp911de commented 5 years ago

BOOLEAN description probably should not use “bit” term. There are three BOOLEAN values in the SQL: TRUE, FALSE, and UNKNOWN (NULL). Only some databases need a some numeric data type to represent a boolean value.

BINARY / VARBINARY is questionable. JDBC uses byte[]. I don't think that ByteBuffer is more usable here.

REAL data type should definitely use java.lang.Float.

Where is DOUBLE PRECISION?

FLOAT(n) in databases that use IEEE data types for it can be mapped to either 32-bit or 64-bit floating point value depending on precision parameter.

JDBC and some databases have TINYINT data type that is not listed here, but I see a codec for it.

Some databases (MySQL family and maybe some others) have non-standard unsigned integer data types.

SQL Standard specifies only TIMESTAMP WITH TIME ZONE. This data type has only time zone offset, but not the time zone name. Some databases really implement it in such way, some can also hold time zone names (Oracle), some don't preserve any time zone information (PostgreSQL), some don't have such data type at all. Why R2DBC use TIMESTAMP with Timezone Offset and TIMESTAMP with Timezone names? Second name looks like SQL type name, but it uses a time zone name unlike the SQL type. Perhaps something like TIMESTAMP with Time Zone should be used for OffsetDateTime. ZonedDateTime can also be supported somehow, but it does not have a corresponding SQL data type.

SQL Standard and some databases have TIME WITH TIME ZONE (java.time.OffsetTime). It looks like nobody knows a valid use case for it, but I think that R2DBC should support it anyway.

Year-month intervals and day-time intervals should be listed separately. They are not compatible. java.time.Duration is a reasonable data type for day-time intervals, but for year-months intervals you need to use a java.time.Period. This class also has days, unlike year-month intervals. Note that PostgreSQL uses ISO semantics for intervals and it is different from the SQL Standard, it's possible to create a combined year-month-day-time interval in this database that works in very weird way in expressions. SQL Standard is more sane in that area. (PostgreSQL supports the SQL Standard data types too.)

Object[] is a reasonable data type for ARRAY, but not for MULTISET. I think that you need a java.util.Set for it. MULTISET is an unordered collection in SQL.

JDBC (but not the SQL Standard) has a special type for XML values. Some databases (but not the SQL Standard) have special type(s) for JSON values. In the standard they are data type format clauses and base data type is a some character or binary string data type.

How DISTINCT and user-defined data types are represented?

Why precision and scale are mentioned in few data types, but aren't mentioned in other data types that also have parameters?

Originally posted by @katzyn in https://github.com/r2dbc/r2dbc-h2/issues/78#issuecomment-496252184

mp911de commented 5 years ago

Thanks for your feedback. I took a look at the mapping and your comments. Please find my responses in-line.

BINARY / VARBINARY is questionable. JDBC uses byte[].

Using ByteBuffer allows reuse of pooled buffers. Using ByteBuffer avoids conflicts with arrays of TINYINT.

REAL data type should definitely use java.lang.Float.

Agree, oversight on our side.

Where is DOUBLE PRECISION?

We should add it.

TINYINT

We should add it.

Some databases (MySQL family and maybe some others) have non-standard unsigned integer data types.

R2DBC type mapping is nonexhaustive and should be seen as a guideline for drivers.

Zoned date/time values

We should remove ZonedDateTime from the spec and use OffsetDateTime instead.

OffsetTime

Good point. We should add it.

MULTISET and Set<…>

We try to avoid parametrized types as the type parameter is not available during runtime. Using array types retains type safety.

XML and JSON support

R2DBC SPI is intended as driver-level API and XML documents are mostly textfields or binary streams. JSON and XML manipulation are potentially the interesting bits. As these things happen in the application or client-layer, we are not affected on SPI level. This leaves us with nothing to do for XML and JSON.

How DISTINCT and user-defined data types are represented

We don't have a specification for that yet and I'd like to postpone UDT's and similar types in the specification until need arises and we get a sufficient number of implementations that would benefit from that spec.

Distinct would probably map to the underlying type. Not sure we need to state the obvious in the spec.

Why precision and scale are mentioned in few data types, but aren't mentioned in other data types that also have parameters?

For DECIMAL and FLOAT parameters are mentioned to explain impact on the type mapping/properties. Other types don't feature a change of type mapping.

mp911de commented 5 years ago

@katzyn I opened PR #100 to address your findings. Care to have a look?