pgspider / influxdb_fdw

InfluxDB Foreign Data Wrapper for PostgreSQL.
Other
58 stars 14 forks source link

Fuzzy matching operator `like` produces wrong results #43

Open cobolbaby opened 10 months ago

cobolbaby commented 10 months ago
bdc=# select distinct
--to_char(time,'yyyy-mm-dd hh24:mi:ss') as lasttime,
device,line,sensor,value
from dm.iot_sqt_dap_sensor 
where time between '2023-12-15T06:57:23.198Z' and '2023-12-15T07:57:23.198Z'
and device = 'CDU02' and value <> 0
and sensor LIKE 'A%';                                                                                                                                                                                                           device | line |  sensor  | value 
--------+------+----------+-------
 CDU02  | L01  | A32      |     1
 CDU02  | L01  | A38      |     1
 CDU02  | L01  | Alarm    |     1
 CDU02  | L01  | PS5A_PS2 |  0.01
 CDU02  | L01  | PS5A_PS2 |  0.02
 CDU02  | L01  | PS5A_PS2 |  0.03
 CDU02  | L01  | PS5A_PS2 |  0.04
 CDU02  | L01  | PS5A_PS2 |  0.05
(8 行记录)

bdc=# select distinct
--to_char(time,'yyyy-mm-dd hh24:mi:ss') as lasttime,
device,line,sensor,value
from dm.iot_sqt_dap_sensor 
where time between '2023-12-15T06:57:23.198Z' and '2023-12-15T07:57:23.198Z'
and device = 'CDU02' and value <> 0
and LEFT(sensor, 1) = 'A';
 device | line | sensor | value 
--------+------+--------+-------
 CDU02  | L01  | A32    |     1
 CDU02  | L01  | A38    |     1
 CDU02  | L01  | Alarm  |     1
(3 行记录)

bdc=# explain verbose                                                                                                                                                                                                                   select distinct
--to_char(time,'yyyy-mm-dd hh24:mi:ss') as lasttime,
device,line,sensor,value
from dm.iot_sqt_dap_sensor 
where time between '2023-12-15T06:57:23.198Z' and '2023-12-15T07:57:23.198Z'
and device = 'CDU02' and value <> 0
and sensor LIKE 'A%';
                                                                                                         QUERY PLAN                                                                                                         
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1.01..1.02 rows=1 width=104)
   Output: device, line, sensor, value
   ->  Sort  (cost=1.01..1.01 rows=1 width=104)
         Output: device, line, sensor, value
         Sort Key: iot_sqt_dap_sensor.line, iot_sqt_dap_sensor.sensor, iot_sqt_dap_sensor.value
         ->  Foreign Scan on dm.iot_sqt_dap_sensor  (cost=10.00..1.00 rows=1 width=104)
               Output: device, line, sensor, value
               InfluxDB query: SELECT * FROM "dap_sensor" WHERE ((time >= '2023-12-15 06:57:23.198')) AND ((time <= '2023-12-15 07:57:23.198')) AND (("value" <> 0)) AND (("sensor" =~ /A(.*)/)) AND (("device" = 'CDU02'))
(8 行记录)

bdc=# 

The root cause is that the push-down condition ("sensor" =~ /A(.*)/) are incorrect.

mkgrgis commented 10 months ago

I think the problem is in no initial ^ if LIKE argument doesn't begins from %. Can you confirm this vision, @cobolbaby ? Maybe also no trailing $ if LIKE argument doesn't ends with %?

cobolbaby commented 10 months ago

LIKE doesn't support regular expressions.

bdc=# explain verbose                                                                                                                                                                                                                   select distinct
--to_char(time,'yyyy-mm-dd hh24:mi:ss') as lasttime,
device,line,sensor,value
from dm.iot_sqt_dap_sensor 
where time between '2023-12-15T06:57:23.198Z' and '2023-12-15T07:57:23.198Z'
and device = 'CDU02' and value <> 0
and sensor LIKE 'A38$';
                                                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1.01..1.02 rows=1 width=104)
   Output: device, line, sensor, value
   ->  Sort  (cost=1.01..1.01 rows=1 width=104)
         Output: device, line, sensor, value
         Sort Key: iot_sqt_dap_sensor.line, iot_sqt_dap_sensor.sensor, iot_sqt_dap_sensor.value
         ->  Foreign Scan on dm.iot_sqt_dap_sensor  (cost=10.00..1.00 rows=1 width=104)
               Output: device, line, sensor, value
               InfluxDB query: SELECT * FROM "dap_sensor" WHERE ((time >= '2023-12-15 06:57:23.198')) AND ((time <= '2023-12-15 07:57:23.198')) AND (("value" <> 0)) AND (("sensor" =~ /A38\$/)) AND (("device" = 'CDU02'))
(8 行记录)

bdc=# select distinct
--to_char(time,'yyyy-mm-dd hh24:mi:ss') as lasttime,
device,line,sensor,value
from dm.iot_sqt_dap_sensor 
where time between '2023-12-15T06:57:23.198Z' and '2023-12-15T07:57:23.198Z'
and device = 'CDU02' and value <> 0
and sensor LIKE 'A38$';
 device | line | sensor | value
--------+------+--------+-------
(0 行记录)
mkgrgis commented 10 months ago

Thanks, @cobolbaby ! I read we have only regexp InfluxDB operator for LIKE deparsing. In your case LIKE 'A%' should be deparsed as ^A regexp because without initial % we have start of string and without % at the end of like string we have end of string. Please also refer http://postgres.cn/docs/13/functions-matching.html for deparsing _ as one character regular expression equivalent for full deparsing implementation.

Currently implemented logic is other, see inside of https://github.com/pgspider/influxdb_fdw/blob/c7cc9f2138b87b6761476fed11381104022bb0de/deparse.c#L2135-L2136 Please reference inside of https://github.com/pgspider/influxdb_fdw/blob/c7cc9f2138b87b6761476fed11381104022bb0de/deparse.c#L2144 and compare this code with many LIKE testcases in repository search engine. As example https://github.com/pgspider/influxdb_fdw/blob/c7cc9f2138b87b6761476fed11381104022bb0de/expected/15.0/schemaless/add_tags.out#L291 etc. After this you can try to modify LIKE deparsing