Open za-arthur opened 1 week ago
I think we need to catch possible non-fatal errors in subroutines called from s3process_task (i.e. s3_put_file, s3_put_empty_dir, s3_get_file, s3_get_file_part etc.) and convert them to FATAL via PG_TRY() {..} PG_CATCH() { elog (FATAL); } PG_END_TRY()
I see there're only few places where we can get non-fatal error i.e. in read_file_part() and write_file_part()
@akorotkov do you like this way?
Tested on main branch, commit https://github.com/orioledb/orioledb/commit/bd8e32d0ebaafd0ea3ec3074233b65167f3b6fb7. Possibly related: https://github.com/orioledb/orioledb/issues/326
In case of an error within an S3 worker
checkpointer
process gets stuck,killall postgres
doesn't terminate the instance.To reproduce the issue easily one can apply the patch:
and run tests
t.s3_test.S3Test.test_s3_check_control
. Tests will get stuck after executingkillall postgres
only the mainpostmaster
andcheckpointer
processes will remain. The backtrace of thecheckpointer
process: