pgspider / sqlite_fdw

SQLite Foreign Data Wrapper for PostgreSQL
Other
202 stars 36 forks source link

Many problems with formal `text` affinity values in foreign tables #66

Open mkgrgis opened 1 year ago

mkgrgis commented 1 year ago

Only non - STRICT tables in legacy SQLite databases is affected.

  1. There is foreign table with bytea column. In SQLite values usually belongs to blob affinity. In SQLite BLOB column also there is some blob images of simple UTF-8 or ASCII text files, for example with such content as テスト or Hello world. This values stores with text affinity and is normal for converting to bytea. Why in this case there is error https://github.com/pgspider/sqlite_fdw/blob/9fd31f43b08ca64006b86d0de316222c213d0081/sqlite_query.c#L93 (look like "text value for blob affinity") ?
  2. There is foreign table with bigint column. Sometimes in SQLite there is empty string values (formally text affinity) in this column which obviously means NULL. I think we can't formally analyse empty strings in not-text columns always as textaffinity. If PostgreSQL type of data in foreign column don't belongs to text/varchar family we can threat empty string as NULL. Maybe there is a reason to add a new option for "threat empty string as NULL for non text and non blob PostgreSQL columns"?
t-kataym commented 1 year ago

Thank you for reporting.

SQLite FDW expects/assumes that column type of foreign table and data affinity of SQLite are same, as you know.
It is not supported to map different types which requires implicit type cast (conversion) of data.

---Just a comment:--- To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

Additionally, we need to consider and care the logic of pushing down of WHERE condition.
For example, SELECT * FROM tbl WHERE col IS NULL is given to PostgreSQL. SQLite FDW wants to push down col IS NULL. If supporting "treat empty string as NULL", SQLite FDW needs to pushdown col = '' also like SELECT * FROM tbl WHERE col IS NULL or col = '' on SQLite.

mkgrgis commented 1 year ago

Thanks for comments, @t-kataym ! I made a draft for SELECT only https://github.com/mkgrgis/sqlite_fdw/commit/db7f918a44ecd1eb00d79e3eff851f6740b36061 Could you point me a line to add col IS NULL or col = '' mechanism, because it's still hard for me to determine all needed lines for code extension. Maybe something near https://github.com/pgspider/sqlite_fdw/blob/9fd31f43b08ca64006b86d0de316222c213d0081/sqlite_query.c#L214 ?

mkgrgis commented 1 year ago

Test case for text affinity for bytea column. SQLite table

CREATE TABLE "BLOB" (
    i INTEGER PRIMARY KEY AUTOINCREMENT,
    b BLOB
);

PostgreSQL foreign table

CREATE FOREIGN TABLE public."BLOB" (
    i int4 OPTIONS(key 'true') NULL,
    b bytea NULL
)
SERVER sqlite_srv;

Data preparing

echo -n "test" | hexdump -C
echo -n "テスト" | hexdump -C

74 65 73 74 = test e3 83 86 e3 82 b9 e3 83 88 = テスト

SQLite INSERT

INSERT INTO "BLOB" ("b") VALUES (x'e38386e382b9e38388');
INSERT INTO "BLOB" ("b") VALUES (x'74657374');
INSERT INTO "BLOB" ("b") VALUES ('テスト');
INSERT INTO "BLOB" ("b") VALUES ('test');

SQLite, affinity check SELECT i, b, typeof (b) FROM "BLOB"; The same data in DBeaver изображение During SELECT in PostgreSQL there is invalid input for type "bytea" = SQLite "blob", but affinity = "text" for value ='テスト'. Practice described above was typical for SQLite-bash interaction for text files without \0 suitable to insert without hexdump program processing. I think we can threat this practice as common and safe for SQL2016 behaviour because there is no text in SQLite encodings we can't threat as blob.

In clean "BLOB" table I also try to test

DELETE FROM "BLOB";
INSERT INTO "BLOB" ("b") VALUES (5);
INSERT INTO "BLOB" ("b") VALUES (8935456748442424);
INSERT INTO "BLOB" ("b") VALUES (32.8);

After this ugly SQL we have successfully SELECT in PostgreSQL!

It's SQL2016 unsafe while values with int and real affinity can successfully converted to PostgreSQL bytea after something like SQLite to_string. First it's not per bit representation of this values and second SQL2016 disallowed this transformation at all because bit represetation of numeric types is implementation detail. Why there is disallowed int/real transformation to bytea but error message for relatively safe text affinity?

mkgrgis commented 1 year ago

For not ugly data for bytea there is solution draft near https://github.com/mkgrgis/sqlite_fdw/blob/f109f0facf3b85da6b57abb0b7c5b64773997f27/sqlite_query.c#L109

mkgrgis commented 1 year ago

Look like we need CAST in conditions for bytea columns of foreign tables.

Depends on affinity, data is from first example

SELECT i, b, typeof (b) FROM "BLOB" where b = 'テスト'; -- 1 row
SELECT i, b, typeof (b) FROM "BLOB" where b = x'e38386e382b9e38388'; -- 1 row

Depends on data only, data is from first example

SELECT i, b, typeof (b) FROM "BLOB" where cast (b as text) = 'テスト'; -- 2 rows
SELECT i, b, typeof (b) FROM "BLOB" where cast (b as blob) = x'e38386e382b9e38388'; -- 2 rows
mkgrgis commented 1 year ago

Now for PostgreSQL with select * from "blob" where b = 'テスト'; there is base query デバッグ: sqlite_fdw: finalize SELECTi,bFROM main."blob" WHERE ((b= X'e38386e382b9e38388')) only 1 row for blob affinity only.

mkgrgis commented 1 year ago

To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

t-kataym commented 1 year ago

what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.

Could you point me a line to add col IS NULL or col = '' mechanism, because it's still hard for me to determine all needed lines for code extension.

Please refer sqlite_deparse_null_test() which creates IS NULL expression for SQLite query. But I think it is unnecessary to understand how to pushdown col IS NULL or col = '' mechanism because of the following reason.

The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think. For example, SQLite table has following record (col1 is an empty string, col2 is NULL).

col1 col2
”” NULL

If user execute SELECT col1 FROM tbl WHERE col1 = col2; on PostgreSQL, SQLite FDW should execute SELECT col1 FROM tbl for SQLite. To generalize it, I think it is safe not to pushdown WHERE condition which has text column.

mkgrgis commented 1 year ago

what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

@mkgrgis Thank you for summarizing it. Please give us a time to confirm it.

Yes, please. There is updated table now. I understand it will hard to decide about 17×5=85 datatype×affinity combinations with only 8 obvious cases.

Please refer sqlite_deparse_null_test() which creates IS NULL expression for SQLite query.

Thanks, @t-kataym !

The correct behavior of FDW is not to pushdown WHERE conditions if "empty_as_non_text_null" is enabled, I think.

Yes, it's better for SQL:2016 behaviour on PostgreSQL side.

If user execute SELECT col1 FROM tbl WHERE col1 = col2; on PostgreSQL, SQLite FDW should execute SELECT col1 FROM tbl for SQLite.

I understand your example, @t-kataym, thanks. Really there is a problem, but other.

I thinks it's a bad example. Neither for '' nor for NULL there will no true condition predicate. Empty strings are equal '' = '', but one NULL isn't equal to other NULL. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2 will always NULL if one of arguments will be NULL.

t-kataym commented 1 year ago

I thinks it's a bad example. Neither for '' nor for NULL there will no true condition predicate. Empty strings are equal '' = '', but one NULL isn't equal to other NULL. If there will be implemented full SQL:2016 behaviour on PostgreSQL side with our FDW, col1 = col2 will always NULL if one of arguments will be NULL.

@mkgrgis Thank you for correcting me. Just my example was not suitable.
Anyway, I believe that you understood what I wanted to say.

mkgrgis commented 1 year ago

Anyway, I believe that you understood what I wanted to say.

Yes, @t-kataym . Let's study about SQL:2016 behaviour in context of prefetched SELECT col1 FROM tbl for SELECT col1 FROM tbl WHERE col1 = col2;.

First I have thought WHERE x IS NULL OR x='' is hard and unsafe. Let's test around int and uuid as first examples.

UUID Well formed UUID in SQLite can have text or blob(only if contains 16 bytes) affinity. All BLOBs with 16 bytes of data is correct UUID, but only some texts is correct (see https://sqlite.org/src/file/ext/misc/uuid.c). Also formally mixed affinity blob/text is possible for UUID in SQLite. For UUID in SQLite pushdowned IS NULL is totally unsafe, because for SQLite IS NULL is only NULLs, but for PostgreSQL IS NULL includes all data unusable as well formed uuid value. Also WHERE uuid = {some const} may pushdowned in SQLite as search for both blob and text forms for simple, but not for STRICT tables.

Conclusion: anyway no pushdowning needed for uuid.

int With PostgreSQL int we have something other. In PostgreSQL I have SQL CREATE TABLE Test_int" ( i serial4 NOT NULL);. Let's CREATE TABLE Test_int" ( i int NOT NULL); in SQLite. SELECT i FROM "Test_int" where i = ''; is correct for SQLite. For PostgreSQL it's incorrect query with error, hence FDW don't need support this query!

What about joining and JOIN pushdowning for int SQLite column with some empty strings? For PostgreSQL it's impossible to join any data for int (int2, int8 etc.) column by value equal to empty string ''. In SQLite it's possible, but it's not SQL:2016 behaviour, we can only note this in documentation (README.md) For SELECT a.*, b.* FROM a INNER JOIN b ON a.ia = b.ib in PostgreSQLia = '' or ib = '' is impossible, but for SQLite we can have corresponding rows with NULL values after FDW. I think it's very rare case hence pusdowning is SQL:2016 safe with note in README.md and this will be usefully for data access speed.

Conclusion: we can pushdown conditions for int family columns with implementation IS NULL as IS NULL OR '' if option discussed above is true.

About other datatypes let's wait on your opinion and opinion of your department employees, @t-kataym . Its very important to decide about all combinations, because the data transformations is de facto kernel of sqlite_fdw and users expect from us SQL:2016 (with little exceptions) behaviour of data in foreign tables.

nxhai98 commented 1 year ago

To support it, we need to define the conversion specification for various patterns of combinations of types between PostgreSQL and SQLite.

@t-kataym , what about https://github.com/mkgrgis/sqlite_fdw/tree/readme-fix-and-additions#datatypes ?

Hello @mkgrgis,

Sorry for late response, I've checked the table and there are some comments:

About description:

'x - error' this description is meaningless it seems a result of a manual test result. Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.

1. About SQLite INT & REAL transparent:

2. About SQLite BLOB transparent:

It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':

contrib_regression=# select * from c;
          c1          
----------------------
 \xe38386e382b9e38388
(1 row)

contrib_regression=# alter foreign table c alter column c1 type text;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
   c1   
--------
 テスト
(1 row)

contrib_regression=# alter foreign table c alter column c1 type json;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type json
DETAIL:  Token "テスト" is invalid.
CONTEXT:  JSON data, line 1: テスト
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type date: "テスト" 

As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:

sqlite> .schema c
CREATE TABLE c (c1 blob);
sqlite> DELETE FROM c;
sqlite> INSERT INTO c VALUES ('2023-01-01');

contrib_regression=# alter foreign table  c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
     c1     
------------
 2023-01-01

3. About SQLite TEXT:

In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea. And sqlite API may return TEXT type for BLOB column.

However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...

About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data

The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.

mkgrgis commented 1 year ago

Hello, @nxhai98!

What about we are discussing? There are 3 aspects of transformation table

and also there are 2 C language data transformations

My table from previous URL is about future behaviour and mainly (but not only) about SELECT. I think You are also discussing about this.

About description:

'x - error' this description is meaningless it seems a result of a manual test result.

No, this means only C-language code of sqlite_fdw will throw here an error of improper data like https://github.com/pgspider/sqlite_fdw/blob/e865a4ddb1a79bb90cec41b2a1f7a33c305f2696/sqlite_query.c#L94

Some type transformations do not always error such as SQLITE BLOB to Timestamp as I describe on note-2 below. I think it should be 'not support' for case always error and V for another case.

Yes, you are right. No support is ISO SQL sentence. I can use "empty set" sign for this case, it's more clear.

1. About SQLite INT & REAL transparent:

* SQLite INT/REAL => bytea: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C bytes => Postgres bytea

I have described this case here. It's possible to show SQLite int as bytea, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.

* SQLite INT/REAL => uuid: It should be 'V', the cast behavior: SQLite INT/REAL => Cast by sqlite API => C string => Postgres input function => Postgres uuid

SQLite int and real is not longer than 8 byte, uuid is only 16byte value. Hence all possible input will be improper for uuid.

2. About SQLite BLOB transparent:

It does not correct at conversion to date, json, name, text, timestamp, timestamptz, varchar. It should be 'V' instead of 'x':


contrib_regression=# select * from c;
          c1          
----------------------
 \xe38386e382b9e38388
(1 row)

contrib_regression=# alter foreign table c alter column c1 type text;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
   c1   
--------
 テスト
(1 row)

Please note here was interpretation of BLOB affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite.

contrib_regression=# alter foreign table c alter column c1 type json;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type json
DETAIL:  Token "テスト" is invalid.
CONTEXT:  JSON data, line 1: テスト
contrib_regression=# alter foreign table c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
ERROR:  invalid input syntax for type date: "テスト" 

As you can see, the error just because the text presentation is not match with target type. If the blob column has expected data for text type, the conversion is OK:

sqlite> .schema c
CREATE TABLE c (c1 blob);
sqlite> DELETE FROM c;
sqlite> INSERT INTO c VALUES ('2023-01-01');

contrib_regression=# alter foreign table  c alter column c1 type date;
ALTER FOREIGN TABLE
contrib_regression=# select * from c;
     c1     
------------
 2023-01-01

Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB data in SQLite is very big and also applicable to text, hence sometimes we can check BLOB or text length, especially for types from datatime family, uuid, name, 'bool', 'int', maybe also for varchar with a determined length. Now all values are fetched even if obviously malformed because are very long.

3. About SQLite TEXT:

In my understanding sqlite_fdw does not support cast (raise ERROR) only when execute cast from SQLITE TEXT to Postgres int2, int4, int8, bool, float4, float8, numeric and bytea.

Yes, it's normal for ISO SQL:2016 and in my opinion.

And sqlite API may return TEXT type for BLOB column.

As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text and blob affinity usually have equal size limits, but there are many blobs with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.

However, the description of these types is confused, '-' for float4, float8; 'x' for int2, int4; 'V+' for numeric...

Marked as V+, but most of well-formed values will store as values with real affinity. Some hex or octal integer numbers can have text affinity.

About: SQLite TEXT -> date, json, numeric, time,... (V+). The cast behavior is the same: SQLite TEXT -> C string -> Postgres input function -> Postgres data

The behavior is same as SQLite TEXT -> text(✔), varchar(✔), name (V) but the description is not same.

In ISO SQL varchar without length and text is the nearest equivalent of text affinity, but name always have length and formally can be truncated. I meant only this difference.

mkgrgis commented 1 year ago

About SQLite BLOB transparent

As you can see, the error just because the text presentation is not match with target type.

Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs . Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite. Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

nxhai98 commented 1 year ago

Thank for the feedback @mkgrgis,

My table from previous URL is about future behaviour and mainly (but not only) about SELECT. I think You are also discussing about this.

Yes, correct.

I have described this case https://github.com/pgspider/sqlite_fdw/issues/66#issuecomment-1455789778. It's possible to show SQLite int as bytea, but this presentation isn't recommended by ISO SQL as detail of implementation. Yes, Your conception isn't look like current behaviour, but can be usefully for learning SQLite internals for students. I approve Your algorithm.

Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.

SQLite int and real is not longer than 8 byte, uuid is only 16byte value. Hence all possible input will be improper for uuid

Thank for explaining, I understood.

Please note here was interpretation of BLOB affinity data as text data with PostgreSQL encoding of current database! We must add this important non ISO SQL aspect in documentation. Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite.

Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.

Yes, with cast to encoding of current PostgreSQL database all this cases сan be marked as V (transparent transformation), but I added new T value "cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable". Max length of BLOB data in SQLite is very big and also applicable to text, hence sometimes we can check BLOB or text length, especially for types from datatime family, uuid, name, 'bool', 'int', maybe also for varchar with a determined length. Now all values are fetched even if obviously malformed because are very long.

Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable is not correct. How about: cast to text in SQLite utf-8 encoding and then transparent transformation if applicable?

As SQLite C language text-like output SQLite BLOB data really can be more flexible for next transformations. But for PostgreSQL C function this input will not obvious. Yes, text and blob affinity usually have equal size limits, but there are many blobs with no available text presentation in current encoding of PostreSQL database. As example UTF-8 have improper byte sequences cast result can be malformed.

sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported. The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.

Not only presentation, @nxhai98, but also encoding! Please note, Unicode databases like UTF-8/UTF-16/UTF-32 isn't only possible. Yes, SQLite recommends threat all text as Unicode data, but we have no such recommendations about BLOBs . Also WHERE (PostgreSQL for SQLite) data search will not easy in this case. Under our virtual PostgreSQL data can be physical text or blob in SQLite. Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

Do you mean:

postgres: select * from t2 where bytea_col = 'test';
=> sqlite execute: select * from t2 where bytea_col = 'test';

postgres: select * from t2 where bytea_col = '\x74657374';
=> sqlite execute: select * from t2 where bytea_col = '\x74657374';

If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea and the implicit cast is executed before we get this value in FDW (when build sqlite query):

explain verbose select * from tbl where bytea_col = 'test';
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.tbl  (cost=0.00..21.00 rows=4 width=64)
   Output: bytea_col, text_col
   Filter: (tbl.bytea_col = '\x74657374'::bytea)
(3 rows)

I check again the table:

mkgrgis commented 1 year ago

Thank for explaining. However, SQLite INT/REAL => bytea still 'x' in the datatype table. Could you update this.

Yes, fixed, @nxhai98!

Currently, sqlite_fdw support only utf-8 SQLite database and don't consider about current Postgres encoding.

It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP, EUC_CN or WIN866 PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?

Currently, sqlite_fdw always uses sqlite3 APIs to cast value to text/int/float and always 'utf-8' for text. So, the T value cast to text in encoding of current PostgreSQL database and than transparent transformation if applicable is not correct. How about: cast to text in SQLite utf-8 encoding and then transparent transformation if applicable?

More clear is

... cast to text in SQLite utf-8 encoding, then to *PostgreSQL text with current encoding of database** and then transparent transformation if applicable ...

sqlite_fdw only get BLOB data directly when postgres column type is bytea, with other data type, BLOB is cast to int/float/text by sqlite3 API first and only UTF-8 supported. The cast: Sqlite BLOB -> sqlite3_column_text (utf-8: unsigned char *) -> Postgres input function.

Thanks. So, SQLite blob as text will be utf-8. It's good

... Can we conceptually write flexible WHERE PostgreSQL data to SQLite query transformation with searching applicable data in both BLOB and text forms?

Do you mean:

postgres: select * from t2 where bytea_col = 'test';
=> sqlite execute: select * from t2 where bytea_col = 'test';

postgres: select * from t2 where bytea_col = '\x74657374';
=> sqlite execute: select * from t2 where bytea_col = '\x74657374';

No, I meant for both cases something like sqlite execute: select * from t2 where bytea_col = '\x74657374' or bytea_col = 'test'; Because for non-STRICT tables real data can be text or BLOB and we have no guarantee. For bytea search not sqlite execute: select * from t2 where bytea_col = '\x74657374'; , but sqlite execute: select * from t2 where cast(bytea_col as blob) = '\x74657374'; For text search not sqlite execute: select * from t2 where bytea_col = 'test'; but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';

If so, his may not feasible on FDW because there is implicit cast in OP: bytea_col = 'test' this corresponding with bytea_col = 'test'::bytea and the implicit cast is executed before we get this value in FDW (when build sqlite query):

explain verbose select * from tbl where bytea_col = 'test';
                         QUERY PLAN                         
------------------------------------------------------------
 Seq Scan on public.tbl  (cost=0.00..21.00 rows=4 width=64)
   Output: bytea_col, text_col
   Filter: (tbl.bytea_col = '\x74657374'::bytea)
(3 rows)

Yes, I am about similar problem.

I check again the table:

* description of SQLite TEXT => float4/float8 (-), int2/int4/int8 (x) to all V+. With current behavior: sqlite_fdw does not support these cast => it should be (∅) or (-)

Marked as -, thanks.

* Also with SQLite TEXT => numeric (V+) and bytea (-), sqlite_fdw does not support these cast => it should be (∅) or (-)

Fixed. for numeric, - for bytea. We can thereat SQLite text as UTF-8 text blob.

* The cast SQLite BLOB => float4/float8/int2/int4/int8 (b - show per-bit form) the current behavior (cast behavior of SQLite API) is: SQLite BLOB -> SQLite TEXT -> C number: https://www.sqlite.org/lang_expr.html#castexpr
* The cast SQLite BLOB -> numeric (∅) => it should be (V+): the cast behavior: SQLite BLOB -> SQLite TEXT (sqlite3_column_text()) -> input function of postgres

Fixed. Please carefully review the table again, @nxhai98. I'll come back at the end of next week after studying about testing sqlite_fdw in PostgreSQL source code tree.

nxhai98 commented 1 year ago

@mkgrgis Thank for fixing, I confirmed the table.

It's important don't consider about current Postgres encoding! UTF-8 output of SQLite is not problem, also UTF-16 SQLite databases is not problem also. What if there will be some EUC_JP, EUC_CN or WIN866 PostgreSQL? Maybe we should mark all text output of SQLite as UTF-8 input for PostgreSQL?

I understood there is a problem about not match character set. However, sqlite_fdw does not support encoding conversion, it assumes that encoding of SQLite and Postgres are the same.

If you want to fix it, here is a suggestion:

Postgres's core provides a function:

/*
 * Convert src string to another encoding (general case).
 *
 * See the notes about string conversion functions at the top of this file.
 */
unsigned char *
pg_do_encoding_conversion(unsigned char *src, int len, int src_encoding, int dest_encoding);

It can apply as below to sqlite_fdw:

char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8
char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding());        // <-- convert from utf8 to current database encoding

value_datum = InputFunctionCall(&attinmeta->attinfuncs[attnum],
                valstr,
                attinmeta->attioparams[attnum],
                attinmeta->atttypmods[attnum]);

But it is not a complete solution. Ideally, sqlite_fdw needs to know encoding of both PostgreSQL and SQLite, and converts data if they are different. It also required encoding conversion when build query with text constant (in sqlite_deparse_const() function).

No, I meant for both cases something like sqlite execute: select from t2 where bytea_col = '\x74657374' or bytea_col = 'test'; Because for non-STRICT tables real data can be text or BLOB and we have no guarantee. For bytea search not sqlite execute: select from t2 where bytea_col = '\x74657374'; , but sqlite execute: select from t2 where cast(bytea_col as blob) = '\x74657374'; For text search not sqlite execute: select from t2 where bytea_col = 'test'; but > sqlite execute: select * from t2 where cast (bytea_col as text) = 'test';

I understood your spec, but it seems not feasible because of we do not know when bytea_col is compared with text constant. How do you think about this spec below:

For bytea search: treat BLOB column as BYTEA column, we add the implicit cast for all blob columns when build SQLite query (deparse):

Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374';
-> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';

For text search: treat BLOB column as TEXT column, user must be define BLOB column as text column in foreign table and add implicit cast for all text columns when build SQLite query (deparse):

CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr;  <- define BLOB column as TEXT column on postgres
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test';
-> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';

Here is an example: SQLite:

sqlite> CREATE TABLE BLOB_TBL(c1 BLOB);
sqlite> INSERT INTO BLOB_TBL VALUES (x'e38386e382b9e38388');
sqlite> INSERT INTO BLOB_TBL VALUES ('test');
sqlite> INSERT INTO BLOB_TBL VALUES (x'74657374');;
sqlite> SELECT c1, typeof(c1) FROM BLOB_TBL;
テスト|blob
test|text
test|blob

PostgreSQL:

test=# CREATE FOREIGN TABLE "BLOB_TBL" (c1 text) SERVER sqlite_svr ;  <- define BLOB column as TEXT column
CREATE FOREIGN TABLE
test=# select * from "BLOB_TBL";
   c1   
--------
 テスト
 test
 test
(3 rows)
test=# explain verbose select * from "BLOB_TBL" where c1 = 'test';
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Foreign Scan on public."BLOB_TBL"  (cost=10.00..7.00 rows=7 width=32)
   Output: c1
   SQLite query: SELECT CAST (`c1` AS TEXT) FROM main."BLOB_TBL" WHERE ((CAST (`c1` AS TEXT) = 'test'))
(3 rows)

test=# select * from "BLOB_TBL" where c1 = 'test';
  c1  
------
 test    <- this is TEXT type on SQLite
 test    <- this is BLOB type on SQLite
(2 rows)
mkgrgis commented 1 year ago

Thanks, @nxhai98 ! I am sorry for waiting.

@mkgrgis Thank for fixing, I confirmed the table.

Let's use this table ;-)

char *utf8 = (char *) sqlite3_column_text(stmt, colid); // <-- assumes SQLite text is always UTF8
char *valstr = pg_do_encoding_conversion(utf8, strlen(utf8), PG_UTF8, GetDatabaseEncoding());        // <-- convert from utf8 to current database encoding

value_datum = ....

Yes, this is full SELECT solution I meant, @nxhai98 ! And also we need something like this in sqlite_deparse_const() for WHERE solution, You are completely right! We have guarantees for always and only UTF-8 for sqlite3_column_text output independent on PRAGMA encoding in SQLite. Hence we have only PostgreSQL encoding problems, no similar in SQLite.

Also important function is sqlite3_column_bytes for empty text and empty blob data in SQLite that treated as special data transformation case.

Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = '\x74657374';
-> sqlite: SELECT CAST(bytea_col AS BLOB) FROM t2 WHERE CAST(bytea_col AS BLOB) = x'74657374';
CREATE FOREIGN TABLE t2 (bytea_col TEXT) SERVER sqlite_svr;  <- define BLOB column as TEXT column on postgres
Postgres: SELECT bytea_col FROM t2 WHERE bytea_col = 'test';
-> sqlite: SELECT CAST(bytea_col AS TEXT) FROM t2 WHERE CAST(bytea_col AS TEXT) = 'test';

Nice! Your elegant solution is better than my mechanical solution. I consider this proposals and tests as SQL:2016 recommended for our cases.

How we will begin a draft for the table, @nxhai98 ?

I hope some my drafts will be usefully for full data transformation implementation:

  1. static const char* sqlite_datatype(int t)
  2. Text affinity and datatype variables
  3. Human readable message about runtime data transformation error

What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp) without any pre-operations and write special small runtime_data_transformation_error function?

nxhai98 commented 1 year ago

@mkgrgis, thank for your fb,

What do You think about algorithm structure? Maybe it will be usefully to pack all transformation table logic only in switch (pgtyp) without any pre-operations and write special small runtime_data_transformation_error function?

Yes, I think it may better, I see your draft, sqlite_convert_to_pg looks complicated, pack all logical on switch (pgtyp) should be easier to understand spec.

BTW, we have just pushed the bug fixing related to sqlite_convert_to_pg function, could you check and merge them to your implementation?

mkgrgis commented 1 year ago

Thanks, @nxhai98 ! Your contribution in https://github.com/pgspider/sqlite_fdw/pull/71 is very interesting. I am learning the changes. I'll create a new branch based on this first with little C-code changes without any behaviour modifications and second with step-by-step PRs with small behaviour changes by our table.

mkgrgis commented 1 year ago

@t-kataym and @nxhai98, I have made a PR without changing SQL behaviour to prepare implementation of data transformation table here discussed. Please review https://github.com/pgspider/sqlite_fdw/pull/74. My plan for next little PRs:

  1. Add separate runtime error function for future cases with impossible data converting
  2. Add text transformation to respect PostgreSQL database encoding
  3. Many line-by-line little PRs for the data transformation table here discussed
mkgrgis commented 1 year ago

Thanks @t-kataym and @nxhai98 ! I have made PR https://github.com/pgspider/sqlite_fdw/pull/75 with separate function for data type error message. My plan for next little PRs:

  1. Add text transformation to respect PostgreSQL database encoding (this behaviour doesn't have a tests now)
  2. Many line-by-line little PRs for the data transformation table.
mkgrgis commented 1 year ago

Thanks @t-kataym and @nxhai98 ! I have made PR https://github.com/pgspider/sqlite_fdw/pull/76 with text encoding transformation.

My next PRs will be about implementation of data transformation table.

t-kataym commented 1 year ago

@mkgrgis Thank you for your development. Please request @nxhai98 to review the changes on PR #76 if you finished.

mkgrgis commented 1 year ago

@t-kataym, how can I request @nxhai98 to review my PR? I can only write here my request for @nxhai98, but not something more. I think reviewers for a PR can be assigned only by repository owners.

t-kataym commented 1 year ago

@mkgrgis Could you describe a comment in PR #76 after finished? It seems that you are in progress because test cases has not been added yet.

mkgrgis commented 1 year ago

@t-kataym, can anybody help me with testing conception(plan) in https://github.com/pgspider/sqlite_fdw/pull/76#issuecomment-1567740629 ? I am frustrated during adding a new tests with database encodings.

t-kataym commented 1 year ago

@mkgrgis Could you ask @bichht0608 about testing?

mkgrgis commented 1 year ago

@mkgrgis Could you ask @bichht0608 about testing?

Yes, @t-kataym . I have https://github.com/pgspider/sqlite_fdw/pull/76#issuecomment-1567874357 asked.

mkgrgis commented 1 year ago

@t-kataym in https://github.com/pgspider/sqlite_fdw/pull/76 extended tests after review by @bichht0608 are ready. When should I get opinion of pgspider team?

In next PR there will implemented formally switch in switch grid by affinity-datatype table without SQL behaviour change.

In other PR after next PR there will be new tests and new implementation according some cells of the table.

mkgrgis commented 11 months ago

@t-kataym , I have implemented new PR with formally switch in switch grid by affinity-datatype table without SQL behaviour change.

In next PRs some elements of the table will be implemented and discussed.

t-kataym commented 11 months ago

@mkgrgis I'm sorry for the late response.

@t-kataym , I have implemented https://github.com/pgspider/sqlite_fdw/pull/79 with formally switch in switch grid by affinity-datatype table without SQL behaviour change.

Our member will confirm it and comment in the PR.

mkgrgis commented 11 months ago

Our member will confirm it and comment in the PR.

Thanks, @t-kataym . Now I have a little drafts for full datatype-affinity tests and architecture questions about some cell of the table. I hope we will discuss about it after closing of https://github.com/pgspider/sqlite_fdw/pull/79.

mkgrgis commented 11 months ago

Our member will confirm it and comment in the PR.

@t-kataym , still no reaction and unknown perspectives...

I have moved https://github.com/pgspider/sqlite_fdw/pull/79 contribution to https://github.com/pgspider/sqlite_fdw/pull/82 as a part of UUID support. Also I have restored with a new tests bit and varbit support in https://github.com/pgspider/sqlite_fdw/pull/83. This was in first releases but without test. In current release this support doesn't works and have been reduced to some fragments of code.

The name of this issue is

Many problems with formal text affinity values in foreign tables

I want to know your opinion about common problem of any SQLite text affinity data.

For ex. FDW can read mixedcase text UUID, but PostgreSQL gives only normalised UUID form for binding. Hence we have some data, visible for SELECT, but invisible for WHERE. This is applicable to current UUID support implementation. My support for text affinity UUID also isn't ISO:SQL (BLOB-affinity UUID support is) and ugly but better than there is now. We cannot disallow pushdowns, because table data in PostgreSQL buffer can reduce productivity for large SQLite files. Only for = and !=, not for > and < it can be special deparsing conception. What do you think about main theses of this conception? For ex. how to implement SQLite UUID comparing with PostgreSQL UUID value without - and in lowercase form.

Other ex. PostgreSQL IS NaN can be deparsed to SQLite = 'NaN'. PostgreSQL input also can read Nan nan or nAn, but this values will be invisible for WHERE, because PostgreSQL will deparse only NaN form.

3rd example. SQLite text boolean where Y y Yes YES t T true True TRUE etc equals to 1::bool. There is also simple SELECT but many invisible data for WHERE.

I hope this discussion will be continued this month, @t-kataym .

t-kataym commented 11 months ago

I'm sorry. I cannot spend enough time to this topic. It is not high priority for us.

I want to know your opinion about common problem of any SQLite text affinity data.

My basic though is that:

mkgrgis commented 10 months ago

A new information about Infinity in SQLite, see https://stackoverflow.com/questions/72113935/how-do-i-select-floating-point-infinity-literals-from-a-sqlite-database In SQLite SELECT 9e999; gives Inf literal and PostgreSQL ±Inf can be deparsed to ±9e999. SELECT typeof(-9e999); gives real. This logic will be implemented in future PR after UUID, bit/varbit and bool support.

mkgrgis commented 8 months ago

@t-kataym , now my PR https://github.com/pgspider/sqlite_fdw/pull/83 is ready for review after rebase to git mainstream. This PR implements limited to 64 bits length bit and varbit support as non-text operations in SQLite. This datatypes will be not text but int affinity in SQLite during sqlite_fdw.

After this PR I am implementing ISO:SQL mixed affinity bool support by uuid mixed affinity support sample .

t-kataym commented 8 months ago

@mkgrgis , Thank you for your contribution. Yes, we will confirm the PR.

mkgrgis commented 6 months ago

The next milestone is infinity-like data for both float and double precision. In SQLite we can meet both text affinity value like -Inf or real affinity value for infinity as result of SELECT 9e999;.

mkgrgis commented 2 months ago

@t-kataym , after Infinity support in https://github.com/pgspider/sqlite_fdw/pull/97 I'll make a PR about MAC address support for both 6 and 8 bytes forms (there is actual tested draft). Now all of this cases treated as a values with text affinity and have not ISO:SQL data processing.

mkgrgis commented 4 days ago

Now there are no problems with data with text affinity for numeric, float4 and float8 columns. After PR with macaddr support this issue will be closed, IP address as PostgreSQL data type is not planned for implementation. WKT GIS data, which also have text affinity, will be implemented in 2nd GIS PR.