Open chris-morris-h2o opened 3 days ago
I saw another similar bug here (https://github.com/timescale/timescaledb/issues/6140) where it was suggested to run within psql with \set VERBOSITY verbose
. I did this and received this output:
postgres=# \set VERBOSITY verbose
postgres=# \c history_sdb01
You are now connected to database "history_sdb01" as user "postgres".
history_sdb01=# select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid in (483426,
565356,
602444,
603707,
609154,
928947)
AND t_stamp < 1719578853000
AND t_stamp > 1704088800000;
ERROR: XX000: child rel 1 not found in append_rel_array
LOCATION: find_appinfos_by_relids, appendinfo.c:730
I ran some more tests, it is not isolated to the to_timestamp() function, any operations or casting on the t_stamp column cause the bug:
history_sdb01=# SELECT *, t_stamp::numeric as t_stamp_numeric
FROM sqlth_1_data
WHERE tagid IN (483426, 565356, 602444, 603707, 609154, 928947)
AND t_stamp < 1719578853000
AND t_stamp > 1704088800000;
ERROR: XX000: child rel 1 not found in append_rel_array
LOCATION: find_appinfos_by_relids, appendinfo.c:730
But it doesn't seem it is just the t_stamp column, adding +1 to the tagid column also triggers the issue:
query = f"""
SELECT tagid, intvalue, floatvalue, stringvalue, datevalue, dataintegrity, t_stamp, tagid+1
FROM sqlth_1_data
WHERE t_stamp < {start_timestamp} AND t_stamp > {end_timestamp} and
tagid in (483426,
565356,
602444,
603707,
609154,
928947)
LIMIT 1;
"""
2024-07-01 09:37:04,679 - INFO - Date range: 2024-04-27 to 2024-04-28: Result: (609154, None, 0.0, None, None, 192, 1714258210519, 609155)
2024-07-01 09:37:04,767 - INFO - Date range: 2024-04-26 to 2024-04-27: Result: (609154, None, 0.0, None, None, 192, 1714173390490, 609155)
2024-07-01 09:37:04,853 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, 609155)
2024-07-01 09:37:04,982 - ERROR - Date range: 2024-04-24 to 2024-04-25: Error: child rel 1 not found in append_rel_array
2024-07-01 09:37:05,103 - ERROR - Date range: 2024-04-23 to 2024-04-24: Error: child rel 1 not found in append_rel_array
2024-07-01 09:37:05,163 - ERROR - Date range: 2024-04-22 to 2024-04-23: Error: child rel 1 not found in append_rel_array
As well as if it is in a CTE:
history_sdb01=# WITH base_data AS (
SELECT *
FROM sqlth_1_data
WHERE tagid IN (483426, 565356, 602444, 603707, 609154, 928947)
AND t_stamp < 1719578853000
AND t_stamp > 1704088800000
)
SELECT *, to_timestamp(t_stamp/1000.0) FROM base_data;
ERROR: XX000: child rel 2 not found in append_rel_array
LOCATION: find_appinfos_by_relids, appendinfo.c:730
What type of bug is this?
Unexpected error
What subsystems and features are affected?
Query executor
What happened?
When querying a hyper table adding "to_timestamp(t_stamp/1000.0)" causes the following error but only for some chunks:
An example query that produces this message when executed from pgAdmin:
And another example that produces the error when executed form pgAdmin:
I set up a Python script to see if I had a corrupted chunk or some such where it would loop through my chunks in descending order one day at a time:
Sample log from the code above:
If I add ", to_timestamp(t_stamp/1000.0)" after the columns:
I end up with the error message again but not in every chunk, but a large portion of them.
Removing to_timestamp(t_stamp/1000.0) from the two example queries at the start of the issue also cause them to not produce the error and instead return rows (or no rows if no rows found). I also tried variations on the where clause:
Using an asterisk instead: *, to_timestamp(t_stamp/1000.0)
Getting rid of the decimal on the 1000: *, to_timestamp(t_stamp/1000)
Also the to_timestamp call by itself still causes the error.
At first I thought maybe it was only happening on uncompressed chunks but I get a result from some compressed chunks and uncompressed chunks with my script:
2024-06-28 08:49:33,200 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, datetime.datetime(2024, 4, 25, 21, 46, 5, 466000, tzinfo=datetime.timezone.utc))
Is compressed according to my query:
Which outputs this as the earliest two uncompressed chunks (got some data inserted with a bad timestamp):
Then I thought maybe somehow the t_stamp column got removed from some chunks, but looping through them day by day with every column specified in the select statement shows this to not be true.
I also tried running some of the variations above with SET client_min_messages TO DEBUG5; but the output didn't change.
Here is my table creation script from pgAdmin for my hypertable:
We are using 24 hour chunks on our hypertable. We compress after 45 days. We have around 77 terabytes uncompressed and 3.5 terabytes compressed across 3 databases running on this one self managed Postgres server. The server is on an Azure VM, Standard E48ds v5 with 48 vCPUs and 384 GB of RAM.
It is also running on top of OpenZFS. I can provide any other relevant information needed.
TimescaleDB version affected
2.11
PostgreSQL version used
14.12
What operating system did you use?
Ubuntu 22.04 x64
What installation method did you use?
Deb/Apt
What platform did you run on?
Microsoft Azure Cloud
Relevant log output and stack trace
How can we reproduce the bug?