tds-fdw / tds_fdw

A PostgreSQL foreign data wrapper to connect to TDS databases (Sybase and Microsoft SQL Server)
Other
380 stars 102 forks source link

WHERE is broken #195

Open frafra opened 5 years ago

frafra commented 5 years ago
select count(*)
  from mssql."Senderstart" ss
  join mssql."Individdata" id
    on id."RegNr" = ss."FK_RegNr"
   and id."FK_SpeciesCode" = ss."FK_SpeciesCode"
 where id."RegNr" = '2224';
 count
-------
     1
(1 row)
select count(*)
  from mssql."Senderstart" ss
  join mssql."Individdata" id
    on id."RegNr" = ss."FK_RegNr"
   and id."FK_SpeciesCode" = ss."FK_SpeciesCode"
 where ss."CollarID" = '2224';
 count
-------
     0
(1 row)

SQL Server returns 1, which is the right answer.

Casting fixes the problem:

select count(*)
  from mssql."Senderstart" ss
  join mssql."Individdata" id
    on id."RegNr" = ss."FK_RegNr"
   and id."FK_SpeciesCode" = ss."FK_SpeciesCode"
 where cast (ss."CollarID" as char(50) ) = '2224';
 count
-------
     1
(1 row)

Adding an additional clause produces more records (even without using cast):

select count(*)
  from mssql."Senderstart" ss
  join mssql."Individdata" id
    on id."RegNr" = ss."FK_RegNr"
   and id."FK_SpeciesCode" = ss."FK_SpeciesCode"
 where id."RegNr" = '2224' and ss."CollarID" = '2224';
 count
-------
     1
(1 row)

Similar issue with datetime columns. Querying over materlized views works as usual.

wesselmdw commented 5 years ago

I am seeing the same problem. Casting fixes it, but in my case, slows down the transaction.

DrLove73 commented 5 years ago

@frafra , do you have more info on this? Using following system I have not seen this issue:

CentOS 7 with freetds version 0.95.81 from EPEL and postgresql-10-tds_fdw.x86_64 ver 2.0.0-alpha.3.2.el7 (from repo: https://tds-fdw.github.io/yum/). SQL Server 2008R2 has database with Collation Serbian_Latin_100 which seams to be CP1250. In /etc/freetds.conf I have only:

tds version = 7.3 client charset = UTF-8

PostgreSQL database is named "izp" with Encoding:"UTF8" (SHOW client_encoding shows UNICODE), and Character type and Collation:"en_US.UTF-8".

If I use count on foreign table and on integer data type, it works both with ' ' and without it.

select count(*)
from mssql.izvod iz
where iz."izvodId" = '20259';
select count(*)
from mssql.izvod iz
where iz."izvodId" = 20259;

I tried it also with character varying (50) and no problems there:

select count(*)
from mssql.izvod iz
where iz."brojIzvoda" = '1';

Maybe problem is only when JOIN is used? And what if you just list that same data, using SELECT * instead of SELECT count(*), are records shown same with and without CAST ?

If you still have same issue, please provide info about your system just as I have, including what PostgreSQL shows data type of that column is.

Tolli81 commented 5 years ago

I'm experiencing the same on a similar setup. This is using a fields that's nvarchar(20) on MSSQL side, and varchar(20) on PostgreSQL side.

SELECT count(*) FROM ingest."NAV_JobTask"; -- 50090
SELECT count(*) FROM ingest."NAV_JobTask" WHERE "JobNo"='24404'; -- 50090, should be 3
SELECT * FROM ingest."NAV_JobTask" WHERE "JobNo" ='24404'; -- 50090 records, should be 3
SELECT * FROM ingest."NAV_JobTask" WHERE cast("JobNo" AS char(20))='24404'; -- 3 records as expected

Perhaps related, with auto mapping I observed these in logs:

< 2019-06-21 14:05:15.192 GMT > WARNING:  Table definition mismatch: Foreign source has column named JobNo, but target table does not. Column will be ignored.
< 2019-06-21 14:05:15.192 GMT > WARNING:  Table definition mismatch: Could not match local column JobNo with column from foreign table

Also, perhaps related, is that the source is a view, not a table. Not in a position to do further testing at the moment, unfortunately.

I tried creating the foreign table (on PostgreSQL side) using both a SELECT * query, as well as explicitly defining the columns - same results in both cases.

Using TDS version 7.3 (passed as option to CREATE SERVER).

[root@avalipgdb50 ~]# yum info  postgresql-96-tds_fdw |grep ' : ' |head -n8
Name        : postgresql-96-tds_fdw
Arch        : x86_64
Version     : 2.0.0
Release     : alpha.3.2.el7
Size        : 357 k
Repo        : installed
From repo   : tds_fdw
Summary     : TDS foreing data wrapper for PostgreSQL 9.6

[root@avalipgdb50 ~]# psql --version
psql (PostgreSQL) 9.6.13

[root@avalipgdb50 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

[root@avalipgdb50 ~]# yum info  freetds-devel |grep ' : ' |head -n8
Name        : freetds-devel
Arch        : x86_64
Version     : 0.95.81
Release     : 1.el7
Size        : 41 k
Repo        : epel/x86_64
Summary     : Header files and development libraries for freetds
URL         : http://www.freetds.org/

Source database is MSSQL version 13.0.5264.1 image

Tolli81 commented 5 years ago

An update on this, while actually digging into a diffrent issue I was having - this seem so be releated to passing the query option.

I had been assuming that I chould pass the schema_name 'ext' option to configure the default schema and do SELECT * FROM JiraJobTasks instead of the current SELECT * FROM ext.JiraJobTasks.

Looking into that I found out the proper way to use schema_name is to pass table_name as well instead of a query. So with the following:

CREATE FOREIGN TABLE ingest."NAV_JobTask"
(
[...]
)
SERVER "foo"
OPTIONS (schema_name 'ext', table_name 'JiraJobTasks');

The WHERE conditions are working without problems:

SELECT count(*) FROM ingest."NAV_JobTask"; -- 50090
SELECT count(*) FROM ingest."NAV_JobTask" WHERE "JobNo"='24404'; -- 3, as expected
SELECT * FROM ingest."NAV_JobTask" WHERE "JobNo" ='24404'; -- 3 records, as expected

It should be noted that the user I'm using to connect to the MSSQL server has quite limited privileges, that might play some part in this.

DrLove73 commented 5 years ago

My reasoning is that with "query" tds_fdw or PostgreSQL must guess at data types. What are data types for those 2 columns/fields now? "text" maybe?

frafra commented 5 years ago

@DrLove73 the problem seems to happen only when both JOIN and WHERE are used. No clue why casting it to char(50) makes the join+where query work, because the join is on different columns.

No casting, no JOIN, only WHERE, correct result:

select count(*)
  from "Senderstart"
 where "CollarID" = '2224';
 count 
-------
     1
(1 row)

No casting, JOIN in the first query, WHERE in the second, correct result:

  with joined as (
        select *
          from "Senderstart" ss
          join "Individdata" id
            on id."RegNr" = ss."FK_RegNr"
           and id."FK_SpeciesCode" = ss."FK_SpeciesCode"
       )
select count(*)
 from joined
 where "CollarID" = '2224';
 count 
-------
     1
(1 row)

Using both JOIN+WHERE in a single query without using cast fails. CollarID has type character varying(50). Replacing SELECT COUNT(*) with SELECT * has no effect on the strange query.

I also tried a natural join, but it fails if I add the where clause:

select *
  from "Senderstart"
natural join "Individdata" 
 where "CollarID" = '2224';
ERROR:  DB-Library error: DB #: 102, DB Msg: General SQL Server error: Check messages from the SQL Server, OS #: -1, OS Msg: , Level: 15
DrLove73 commented 5 years ago

I used this code SELECT TOP 100 t.*, t.[text], s.last_execution_time FROM sys.dm_exec_cached_plans AS p INNER JOIN sys.dm_exec_query_stats AS s ON p.plan_handle = s.plan_handle CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t ORDER BY s.last_execution_time DESC; to monitor MSSQL server for incoming SQL commands. If you want to filter to specific part of query, add WHERE t.[text] LIKE N'%opstinaId%'

In case of join, select mesto.*, opstina.* FROM mssql.mesto join mssql.opstina on mesto."opstinaId" = opstina."opstinaId" , seams tds-fdw calls TWO commands: SELECT [mestoId], [naziv], [opstinaId], [PTT] FROM [izp].[mesto] and SELECT [opstinaId], [drzavaId], [naziv] FROM [izp].[opstina] and then it joins them in PostgreSQL!

When WHERE clause is added to JOIN, select mesto.naziv, opstina.* FROM mssql.mesto join mssql.opstina on mesto."opstinaId" = opstina."opstinaId" where mesto."naziv" = 'Malo Crniće' , MSSQL server gets SELECT [opstinaId], [drzavaId], [naziv] FROM [izp].[opstina] WHERE (((CAST((SELECT CAST(null as smallint)) as smallint)) = [opstinaId])) command which gives EMPTY result, and second query is never made!

When you use CAST in JOIN+WHERE query over Foreign table select mesto.naziv, opstina.* FROM mssql.mesto join mssql.opstina on mesto."opstinaId" = opstina."opstinaId" where cast(mesto."naziv" AS char(50)) = 'Malo Crniće' , then functionality is back with MSSQL server getting 2 queries without WHERE, and WHERE is done in PostgreSQL.

I used
systemctl restart postgresql-10 to reboot PostgreSQL server and DBCC FREEPROCCACHE in MSSQL server to empty all cashed plans so all caching is avoided.

DrLove73 commented 5 years ago

There are several things problematic. First, it looks like that PostgreSQL caches entire tables and then performs joins and filtering which seams inefficient. Second, when does PostgreSQL decides to check if data in Foreign tables? What is data in MSSQL has already been changed? How to force PostgreSQl to always ask MSSQL server and have MSSQL take care of updated data?

frafra commented 5 years ago
  1. JOIN pushdown is not supported yet #27
  2. I never had a look about how tds-fdw works internally, so I have no idea actually
akirasankun commented 4 years ago

@DrLove73, I am updating the environment and I came across this error.

### outdated Microsoft SQL Server 2016 (SP2-CU4) (KB4464106) - 13.0.5233.0 (X64)
CentOS Linux release 7.7.1908 (Core) Tds_fdw | 1.0.2 PostgreSQL 9.3.5

image

updated

Microsoft SQL Server 2016 (SP2-CU4) (KB4464106) - 13.0.5233.0 (X64)
CentOS Linux release 7.7.1908 (Core) Tds_fdw | 2.0.1 PostgreSQL 12.2

image

In the outdated environment the filter works correctly and in the updated no works

in all foreign tables

jcarnu commented 4 years ago

Not sure of my assumptions but it may be some kind of pushdown to the foreign server ?

achix commented 4 years ago

Same issue here, casting problem but strangely for postgresql column :

select qry.* FROM (select d."TRNS_DATE"::date as "TRNS_DATE",d."PARAST" as "DOC_NO",CASE WHEN dtls."ID" IS NOT NULL THEN 'REM' ELSE 'CHQ' END as "PO_TYPE", COALESCE(chk."CHQ_NOTE",dtls."PMNT_DTLS1") AS "DESCR", CASE WHEN p.is_epayment IS NULL THEN 'NA' WHEN p.is_epayment=0 THEN 'NO' ELSE 'YES' END as is_epay,p.status,CASE WHEN pf.doc_no IS NOT NULL THEN 'YES' ELSE 'NO' END as has_pay_fin, p.check_date_time from mssql_bdynacom."ACDOC" d LEFT OUTER JOIN mssql_bdynacom."ACDOC_PODTLS" dtls ON (d."ID"=dtls."ACDOC_ID") LEFT OUTER JOIN mssql_bdynacom."ACCHEQUE" chk ON (d."ID"=chk."ACDOC_ID") LEFT OUTER JOIN epaybdynacom.payment p ON (substr(date_trunc('year',d."TRNS_DATE")::text,1,4)::smallint=p.year AND d."PARAST"=p.doc_no) LEFT OUTER JOIN bdynacom.payments_finalization pf ON (substr(date_trunc('year',d."TRNS_DATE")::text,1,4)::smallint=pf.year and d."PARAST"=pf.doc_no) WHERE d."FL_UPD"=1 AND d."FL_RVS"=0 AND d."TRNS_DATE">='2020-01-01' AND d."DOC_TYPE"=2 AND d."FL_ACTV"=1 ) as qry WHERE NOT has_pay_fin::boolean AND (status is NULL OR status NOT IN ('RVSD','RJCT')) AND (is_epay='NA' OR is_epay='NO' OR is_epay='YES' AND status IN ('SENT','ACCP','ACSC','EBPC')) AND is_epay='NO' AND check_date_time>='2020-01-01'::date;

^^ does not bring any result. When I cast explictly from pgsql timestamp to date, i.e. make the last comparison look like : check_date_time::date>='2020-01-01'::date; then it works. Also when I fence the big subquery behind a WITH (CTE), it also seems to work.

aseques commented 2 years ago

It might be not related but I was having issues with the WHERE clause too with tds_fdw and upgrading to version 2.0.2 fixed the issue (it includes #253 that has a fix) After the upgrade the queries work as expected.

tg-solidat commented 6 months ago

Not sure if I am experiencing this issue. But the following query does not work with version 2.0.3. ft.sometable is the foreign table:

select u.u_cardid, (select e."Info" from ft.sometable e where u.u_cardid = ft.cardid)
from j_user u
where u.u_cardid is not null;

The query is written to force a nested loop. Then fdw_tds generates queries like the following:

SELECT [Info] FROM [dbo].[sometable] WHERE (((CAST((SELECT CAST(null as character varying(50))) as character varying(50))) = [cardid]))

The null must be wrong, since the pg-side query filters out nulls. Adding a cast to char(50) as in (u.u_cardid :: char(50)) = e.cardid yields the correct result.