Open cobolbaby opened 7 months ago
@mkgrgis Maybe you could identify the bug together, Thanks a lot.
Hello, @cobolbaby ! I have never works with InfluxDB and I am not PGSpider team member, but my sqlite_fdw
experience allow to locate your problem in https://github.com/pgspider/influxdb_fdw/blob/master/deparse.c This look like there is no needed deparse WHERE
rules for your case. Pushdowning of your query to InfluxDB should use syntax transformation but I am not sure if there is pushdowning or if there is correct result of subquery. The main problem also for pure PostgreSQL is the same table in query and subquery. This cause very specific query plans and I am not sure this plans is well-implemented in this FDW.
@cobolbaby , could you try the same query with window functions or with HAVING
? Also please explain
verbose
for
select * from dm.iot_sqt_agv_count
where time = --'2023-12-04 19:15:53.318+08'
(SELECT max(time) from dm.iot_sqt_agv_count where time > now()-interval '1d');
This can be more helpful sometimes than simple explain
.
please explain verbose for
explain verbose
select * from dm.iot_sqt_agv_count
where time = --'2023-12-04 19:15:53.318+08'
(SELECT max(time) from dm.iot_sqt_agv_count where time > now()-interval '1d');
"Foreign Scan on dm.iot_sqt_agv_count (cost=865.14..859.14 rows=4 width=72)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time = $1))"
" InitPlan 1 (returns $0)"
" -> Aggregate (cost=855.13..855.14 rows=1 width=8)"
" Output: max(iot_sqt_agv_count_1.""time"")"
" -> Foreign Scan on dm.iot_sqt_agv_count iot_sqt_agv_count_1 (cost=10.00..853.00 rows=853 width=8)"
" Output: iot_sqt_agv_count_1.""time"", iot_sqt_agv_count_1.""agvState"", iot_sqt_agv_count_1.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
Perhaps influxdb_fdw does not yet support InitPlan type operators.
could you try the same query with window functions or with HAVING
explain analyse verbose
SELECT *
FROM (
SELECT *,
MAX(time) OVER () AS max_time
FROM dm.iot_sqt_agv_count
WHERE time > NOW() - INTERVAL '1 day'
) AS subquery
WHERE time = subquery.max_time;
"Subquery Scan on subquery (cost=10.00..291.10 rows=1 width=80) (actual time=159.955..159.961 rows=5 loops=1)"
" Output: subquery.""time"", subquery.""agvState"", subquery.value, subquery.max_time"
" Filter: (subquery.""time"" = subquery.max_time)"
" Rows Removed by Filter: 11510"
" -> WindowAgg (cost=10.00..287.55 rows=284 width=80) (actual time=154.067..158.294 rows=11515 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value, max(iot_sqt_agv_count.""time"") OVER (?)"
" -> Foreign Scan on dm.iot_sqt_agv_count (cost=10.00..284.00 rows=284 width=72) (actual time=103.052..147.959 rows=11515 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
"Planning Time: 0.195 ms"
"Execution Time: 160.233 ms"
explain analyse verbose
WITH max_time_cte AS (
SELECT MAX(time) AS max_time
FROM dm.iot_sqt_agv_count
WHERE time > NOW() - INTERVAL '1 day'
)
SELECT iot.*
FROM dm.iot_sqt_agv_count iot
JOIN max_time_cte max_time ON iot.time = max_time.max_time;
"Hash Join (cost=865.17..1710.45 rows=4 width=72) (actual time=5676.963..5691.366 rows=8 loops=1)"
" Output: iot.""time"", iot.""agvState"", iot.value"
" Inner Unique: true"
" Hash Cond: (iot.""time"" = (max(iot_sqt_agv_count.""time"")))"
" -> Foreign Scan on dm.iot_sqt_agv_count iot (cost=10.00..853.00 rows=853 width=72) (actual time=3916.961..5481.379 rows=405138 loops=1)"
" Output: iot.""time"", iot.""agvState"", iot.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"""
" -> Hash (cost=855.15..855.15 rows=1 width=8) (actual time=148.130..148.131 rows=1 loops=1)"
" Output: (max(iot_sqt_agv_count.""time""))"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> Aggregate (cost=855.13..855.14 rows=1 width=8) (actual time=148.114..148.115 rows=1 loops=1)"
" Output: max(iot_sqt_agv_count.""time"")"
" -> Foreign Scan on dm.iot_sqt_agv_count (cost=10.00..853.00 rows=853 width=8) (actual time=98.425..146.246 rows=11518 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
"Planning Time: 0.456 ms"
"Execution Time: 5698.183 ms"
I have never works with InfluxDB and I am not PGSpider team member
Many thanks for your reply. Could you inform me about the current maintainer(s) of the project?
@cobolbaby , do yo think this is normal for InfluxDB or the query SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
is not fully correct? For me there is double quoting and unusual interval. Have I mistaken here?
Many thanks for your reply. Could you inform me about the current maintainer(s) of the project?
Yes. The main maintainer of PGSpider projects is @t-kataym. He makes final reviews of my PRs and knows who can help with some little details of code or usage.
For me there is double quoting and unusual interval. Have I mistaken here?
No problem here. The presence of double quotation marks is because I copied the output results from the web ui of pgAdmin.
@cobolbaby, I haven't understood are there any crashes in case of window functions or HAVING
? https://github.com/pgspider/influxdb_fdw/issues/40#issuecomment-1842086384
I haven't understood are there any crashes in case of window functions or HAVING? https://github.com/pgspider/influxdb_fdw/issues/40#issuecomment-1842086384
No, the PG wonld not crash in that case.
Perhaps influxdb_fdw does not yet support InitPlan type operators.
@mkgrgis Regarding this issue, can you locate the problematic code block?
Regarding this issue, can you locate the problematic code block?
@cobolbaby , InitPlan
look like some execution function in PostgreSQL see https://doxygen.postgresql.org/execMain_8c.html#ae7f37b09dfd050780c4751568a93184f Unfortunately I have no normal experience with plan nodes. Also I can tell you there is no special InitPlan
code infrastructure in this FDW, in sqlite_fdw
, in firebird_fdw
or in oracle_fdw
. Look like InitPlan
is not one of FDW code infrastructure subjects. Hence I can not understand what exact means "support InitPlan type operators". I think this is implemented in PostgreSQL main code for all FDWs and not only for FDWs.
If the problem isn't with InitPlan, I'm really puzzled about where the issue might be. It's not easy for me to debug step-by-step.
It's not easy for me to debug step-by-step.
It's mostly impossible for beginners, see official description from PGDG. This is also not for me.
Even if there is simple testcase like in https://github.com/ibarwick/fbsql/issues/1 gdb
debugging in this simple case is not very representative, see discussion around of.
In my experience only possible way based on set client_min_messages='debugN';
with some optimal N and speculated code rethinking with a tested changes. This is sometimes enough for such complex and not easy commits like this.
I'm really puzzled about where the issue might be.
The problem is in some of internal plan nodes. Maybe some buffers is not resetted between internal SELECT * FROM "F3AGV.count" WHERE ((time > (now() - 1d0h0m0s0u)))
and resulted SELECT * FROM "F3AGV.count" WHERE ((time = $1))
. For me it's first version based on usual plan executing with 1 query only.
@cobolbaby , what about psql
listing with set client_min_messages='debug4';
and the crash? Most of main FDW function have DEBUG3
initial message. Now you have published only server log.
DEBUG: influxdb_fdw : influxdb_fdw_handler
DEBUG: influxdb_fdw : influxdbGetForeignRelSize
DEBUG: influxdb_fdw : influxdbGetForeignPaths
DEBUG: influxdb_fdw : influxdbGetForeignRelSize
DEBUG: influxdb_fdw : influxdbGetForeignPaths
DEBUG: influxdb_fdw : influxdbGetForeignRelSize
DEBUG: influxdb_fdw : influxdbGetForeignPaths
DEBUG: influxdb_fdw : influxdbGetForeignUpperPaths
DEBUG: influxdb_fdw : influxdbGetForeignUpperPaths
DEBUG: influxdb_fdw : influxdbGetForeignPlan
DEBUG: influxdb_fdw : influxdbGetForeignRelSize
DEBUG: influxdb_fdw : influxdbGetForeignPaths
DEBUG: influxdb_fdw : influxdbGetForeignUpperPaths
DEBUG: influxdb_fdw : influxdbGetForeignPlan
DEBUG: influxdb_fdw : influxdbBeginForeignScan
DEBUG: influxdb_fdw : influxdbBeginForeignScan
DEBUG: influxdb_fdw : influxdbIterateForeignScan
DEBUG: influxdb_fdw : influxdbIterateForeignScan
DEBUG: influxdb_fdw: new InfluxDB connection 0x5d52c78fd050 for server "influxdb_iot" (user mapping oid 57344, userid 10)
DEBUG: influxdb_fdw : query: SELECT * FROM "F3AGV.count" WHERE ((time > (now() - 1d0h0m0s0u)))
DEBUG: influxdb_fdw : influxdbIterateForeignScan
...
...
DEBUG: influxdb_fdw : influxdbIterateForeignScan
DEBUG: influxdb_fdw : influxdbIterateForeignScan
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
So. the problem is near influxdbIterateForeignScan
https://github.com/pgspider/influxdb_fdw/blob/c7cc9f2138b87b6761476fed11381104022bb0de/influxdb_fdw.c#L1593-L1594 during second query SELECT * FROM "F3AGV.count" WHERE ((time = $1))
. Also can be affected influxdbBeginForeignScan
or influxdbEndForeignScan
if there is problem of unclear buffer.
Magical phenomenon: If the timestamp comparison is converted into a string comparison, or the equality comparison is converted into an in substatement, it can be executed.
explain analyse verbose
select time,"agvState",value
from dm.iot_sqt_agv_count
where to_char(time, 'YYYY-MM-DD HH24:MI:SS') =
(SELECT to_char(max(time), 'YYYY-MM-DD HH24:MI:SS') from dm.iot_sqt_agv_count where time > now() - interval '1d')
/*
"Foreign Scan on dm.iot_sqt_agv_count (cost=865.15..859.15 rows=4 width=72) (actual time=5984.461..5996.245 rows=8 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" Filter: (to_char(iot_sqt_agv_count.""time"", 'YYYY-MM-DD HH24:MI:SS'::text) = $0)"
" Rows Removed by Filter: 402777"
" InfluxDB query: SELECT * FROM ""F3AGV.count"""
" InitPlan 1 (returns $0)"
" -> Aggregate (cost=855.13..855.15 rows=1 width=32) (actual time=143.270..143.272 rows=1 loops=1)"
" Output: to_char(max(iot_sqt_agv_count_1.""time""), 'YYYY-MM-DD HH24:MI:SS'::text)"
" -> Foreign Scan on dm.iot_sqt_agv_count iot_sqt_agv_count_1 (cost=10.00..853.00 rows=853 width=8) (actual time=99.992..142.025 rows=11516 loops=1)"
" Output: iot_sqt_agv_count_1.""time"", iot_sqt_agv_count_1.""agvState"", iot_sqt_agv_count_1.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
"Planning Time: 0.478 ms"
"Execution Time: 6014.964 ms"
*/
explain analyse verbose
select time,"agvState",value
from dm.iot_sqt_agv_count
where time in
(SELECT max(time) from dm.iot_sqt_agv_count where time > now() - interval '1d')
/*
"Hash Join (cost=865.16..1710.44 rows=4 width=72) (actual time=5206.469..5219.522 rows=8 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" Inner Unique: true"
" Hash Cond: (iot_sqt_agv_count.""time"" = (max(iot_sqt_agv_count_1.""time"")))"
" -> Foreign Scan on dm.iot_sqt_agv_count (cost=10.00..853.00 rows=853 width=72) (actual time=3490.447..5005.403 rows=402817 loops=1)"
" Output: iot_sqt_agv_count.""time"", iot_sqt_agv_count.""agvState"", iot_sqt_agv_count.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"""
" -> Hash (cost=855.14..855.14 rows=1 width=8) (actual time=146.651..146.654 rows=1 loops=1)"
" Output: (max(iot_sqt_agv_count_1.""time""))"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> Aggregate (cost=855.13..855.14 rows=1 width=8) (actual time=146.641..146.643 rows=1 loops=1)"
" Output: max(iot_sqt_agv_count_1.""time"")"
" -> Foreign Scan on dm.iot_sqt_agv_count iot_sqt_agv_count_1 (cost=10.00..853.00 rows=853 width=8) (actual time=103.308..145.374 rows=11516 loops=1)"
" Output: iot_sqt_agv_count_1.""time"", iot_sqt_agv_count_1.""agvState"", iot_sqt_agv_count_1.value"
" InfluxDB query: SELECT * FROM ""F3AGV.count"" WHERE ((time > (now() - 1d0h0m0s0u)))"
"Planning Time: 0.576 ms"
"Execution Time: 5242.908 ms"
*/
Who can help fix this critical bug? @all
@cobolbaby , look like this repository is not very active. Maybe someone from pgspider team will read our diagnostics.
@cobolbaby and @mkgrgis ,
I apologize for the inconvenience caused by this issue. I'm investigating the segmentation fault, which seems related to InitPlan type operators. I will thoroughly analyze influxdbIterateForeignScan and other related functions.
Thanks for your detailed information and suggestions. I'll keep you updated on our progress.
Thank you for your patience and support!
@weiting1lee , could someone from pgspider team also investigate not compilable for new PostreSQL version dynamodb_fdw/ ?
@mkgrgis I will notify the team internally to follow it.
I'm investigating the segmentation fault, which seems related to InitPlan type operators. I will thoroughly analyze influxdbIterateForeignScan and other related functions.
@weiting1lee Any progress ?
@cobolbaby We are currently investigating the segmentation fault issue you reported. Based on our findings, the problem arises due to the parameter not being initialized during the scanning query execution. We are discussing solutions. We will update here as we make progress.
I don't know why I got the error 'Segmentation fault':