tds-fdw / tds_fdw

A PostgreSQL foreign data wrapper to connect to TDS databases (Sybase and Microsoft SQL Server)
Other
377 stars 101 forks source link

Mapping mssql bit types to pgsql boolean #191

Open coreyhuinker opened 5 years ago

coreyhuinker commented 5 years ago

It would be nice if the IMPORT FOREIGN SCHEMA command read in bit columns a postgresql type boolean. Currently they are imported as smallint.

SudoerWithAnOpinion commented 5 years ago

Per T-SQL Docs,

An integer data type that can take a value of 1, 0, or NULL.

See: bit (Transact-SQL)

Bit columns are not the same as boolean, they are integers, and should not be treated as boolean by the engine.

coreyhuinker commented 5 years ago

Likewise a boolean can have the value of true or false, or be NULL.

Per the document @TheRevenantStar cited:

The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.

Even if there were some difference between them, the most common use of a bit field, by far, is as a stand-in for a boolean. I have never seen a single bit field used for anything else. I have seen uses for bit masks, but those were long strings of bits with an even word boundary, never a single bit.

It would be a great help if the foreign data wrapper at least had an option to enable this very useful transformation.

I would further ask that the option be made default.

The patch seems relatively straightforward in modifying this bit here, but if we want to make it an option, then it gets more complex.

Also, around here we'd need to add a BoolGetDatum call (which according to this link maps 0 to false and any other integer value to true...which further matches the BIT documentation:

Converting to bit promotes any nonzero value to 1.

I think tdsConvertToCString will work without modification but I'm not sure.

This is an issue I'm willing to devote time to, if the maintainers agree that the change needs to be made.

SudoerWithAnOpinion commented 5 years ago

Bit fields can be used for masks and booleans, as the DB engine would concatenate them to save storage. Storing them as bool expands that space

But that’s all MSSQL stuff. From the standpoint of interoperability, and because the standard defines a difference, this should not cast that way. In the end though, it’s up to @GeoffMontee

As far as PG is concerned though, bools are not "tinyints" since bool accepts 1,0,NULL, t, f, yes, no. But that is an implementation detail. IMO you should not "mix" unlike types, even if they are synonymous; tinyint and integer is fine since tinyint would be a subset of integer.

jcarnu commented 4 years ago

FWIW : There's a bit type in PostgreSQL : If MSSQL type bit should be mapped, it certainly should be on bit/varbits :

https://www.postgresql.org/docs/12/datatype-bit.html

boolean type is 1 byte wide.

Semantically, bits words should be mapped to bit not to boolean nor to tinyint.

To be more clear, bit to boolean operation is not consistent : one bit can only be 0 or 1 not NULL. Boolean can be true, false or NULL. If, say, we agree on allowing this change, which semantics should be taken : 0 is false and 1 true ? What if reverse semantics is necessary for another project ? (sounds weird, but it may happen in real world problem).

The best thing, if you want arrays of boolean is to use that kind of query :

SELECT array_agg(substring(x'DEADBEEF'::varbit from i for 1)::integer=1) 
FROM generate_series(1,32) AS i

This will return an array of booleans. It also might be translated from MSSQL in the foreign table through a query (rather than table) option ?

My 2¢

coreyhuinker commented 4 years ago

Per https://docs.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-2017 :

The string values TRUE and FALSE can be converted to bit values: TRUE is converted to 1 and FALSE is converted to 0.

So boolean <-> bit mapping is consistent. A project that didn't conform to that is willfully misusing the data type.

There's no such thing as a bit string in SQLServer. The same doc page says that the database will store multiple bit columns in a single byte, but that's an implementation detail, you can't actually access that bit string, only the component bit types. Thus, you have a column that's a boolean.