r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
333 stars 78 forks source link

dbGetQuery failing due to stale Redshift OIDs #469

Open jeffkeller-einc opened 2 months ago

jeffkeller-einc commented 2 months ago

Overview

I am querying an AWS Redshift table over an RPostgres::Redshift connection. The query sometimes fails with error:

Error: Failed to fetch row: ERROR:  could not open relation with OID 8574286

The ERROR: could not open relation with OID <oid-number> portion of the error comes from Redshift and the OID is different each time.

Environment

The RPostgres::Redshift connection is running on an Ubuntu 20.04 machine with libpq-dev 12.19-0ubuntu0.20.04.1, R 4.2.1, DBI 1.2.3, and RPostgres 1.4.5.

Description

The table in question is managed by an independent process which replaces the table periodically. The table OID changes each time it is replaced. When the table replacement process executes, it drops the table and renames a replacement table in a single transaction:

BEGIN TRANSACTION;

DROP TABLE my_table;
ALTER TABLE my_table_tmp RENAME TO my_table;

END TRANSACTION;

When a query over a RPostgres::Redshift connection occurs at the same time, the above error is returned.

DBI::dbGetQuery(con, "SELECT col1, col2 FROM my_table")

I suspect that the select statement is not creating a lock on my_table, which is allowing the table replacement to happen concurrently. This changes the table OID and the select query fails trying to find a stale OID.

My expectation is that the query would run without error, possibly a bit slower if it has to wait for the table replacement to complete.

krlmlr commented 2 months ago

Thanks. Can you wrap your dbGetQuery() in a transaction?

Is the behavior different when using another client, say, the psql CLI?

jeffkeller-einc commented 2 months ago

I tried wrapping in a transaction but was still able to replicate the error:

DBI::dbWithTransaction({
  DBI::dbGetQuery(con, sql)
})

I was also able to replicate the error on MacOS 14.5 with RPostgres 1.4.5 built against postgresql 14.12, 15.7, and 16.3 (autobrew disabled).

I was not able to replicate the error with psql 14.12, 15.7, or 16.3. The select query and drop-rename transaction block each other as expected in this case. Other users of this Redshift cluster have reported no issues with their clients. The issue seems to be specific to RPostgres.

I will also note that I did not experience this problem prior to June 20. I imagine AWS could have made changes to the Redshift version during a maintenance window that introduced this error (my code did not change).

krlmlr commented 2 months ago

Hm... The next step to try could be a small .c file linking against libpq that does the same what the R package would be doing. I currently don't have the capacity to work on this, sorry.

jeffkeller-einc commented 2 months ago

Fair enough. My only idea is that maybe prepared statements are being used and the OIDs are going stale between preparation and execution.

krlmlr commented 2 months ago

Oh, that's a good point. Can you try dbGetQuery(immediate = TRUE) ?

jeffkeller-einc commented 2 months ago

I can still reproduce the error, but the window to do so is noticeably narrower!