Open fordfrog opened 1 week ago
We also ran into this DataFileRead hang issue a couple of days ago and had to perform a restore.
@fordfrog this might be unrelated, but did you upgrade to version 2.15.2 recently without running this script as per the release notes? Just wondering if this is somehow related.
@fordfrog this might be unrelated, but did you upgrade to version 2.15.2 recently without running this script as per the release notes? Just wondering if this is somehow related.
in fact 2.15.2 is the first version that i used (so i didn't use the script). i'm in the process of migrating table with close to 5 billions of ticks to hypertable with 71 chunks as of now, and so far this was the only issue i encountered.
also, not sure if that is important, but i have the database server replicated to another instance of postgresql and there the cluster was fine. so i used the cluster from the replicated database to recover the records (dump and then insert back to the production database after removing the broken chunk).
@fordfrog as can be seen in the logs, someone ran a VACUUM FULL
on this chunk. VF will take an exclusive lock on the chunk and ANY other operation will not be able to do anything till VF completes.
Failed process was running: vacuum full verbose analyze _timescaledb_internal._hyper_2_95_chunk ;
Why was the VF command initiated? It's generally not a recommended practice to use VF.
@fordfrog as can be seen in the logs, someone ran a
VACUUM FULL
on this chunk. VF will take an exclusive lock on the chunk and ANY other operation will not be able to do anything till VF completes.
Failed process was running: vacuum full verbose analyze _timescaledb_internal._hyper_2_95_chunk ;
Why was the VF command initiated? It's generally not a recommended practice to use VF.
it was already after the chunk was broken and it was a try whether complete rewrite of the chunk helps or not... but it freezed, as any other operation on the chunk.
What type of bug is this?
Data corruption, Performance issue
What subsystems and features are affected?
Command processing, Partitioning
What happened?
i can't read from a single chunk in my database. other chunks work fine. anything (select, vacuum, compress, ...) that i try to do on the chunk ends up in DataFileRead hanging. that process can't then even be killed and the whole database has to be restarted, which does not work without issues because of the hanging process.
TimescaleDB version affected
2.15.2
PostgreSQL version used
16.3
What operating system did you use?
gentoo linux
What installation method did you use?
Source
What platform did you run on?
Not applicable
Relevant log output and stack trace
How can we reproduce the bug?
as other chunks work fine, i suspect some corruption and i have no idea how to replicate it. the log shows what was going on there with the chunk during the last week. i just recall today i noticed a vacuum hanging on the chunk for two days or so, so i killed that. but the chunk was probably already broken at that time.
EDIT: i just found out that dropping the broken chunk and re-inserting the data that belong to the chunk should recreate that chunk.
EDIT2: i managed to get rid of the chunk though the database got stuck again. and i restored the data from backup. still checking the data...