r-dbi / RPostgres

A DBI-compliant interface to PostgreSQL
https://rpostgres.r-dbi.org
Other
328 stars 78 forks source link

Memory leak when querying from a packaged function #457

Closed agilly closed 3 months ago

agilly commented 5 months ago

I am connecting to a Redshift Serverless deployment using RPostgres. Let's call that connection con:

r$> class(con)
[1] "PqConnection"
attr(,"package")
[1] "RPostgres"

I am trying to read a table with a where statement. In my case the result is about 144MB:

r$> object.size(a)/1024/1024
144.7 bytes

If I run the following command:

a=DBI::dbGetQuery(con, "select * from schema.table where column = 'value'")

a few times, I see memory usage jump sharply on the first connection (around 1.5GB), then it keeps increasing moderately as I keep rerunning that statement, until stabilizing at ~2GB total RAM.

However, if I wrap the call in a function:

test_db_con=function(con){
  ret=dbGetQuery(con, "select * from schema.table where column = 'value'")
  return(ret)
}

and put it in a package, the following:

devtools::load_all()
## create con
a=test_db_con(con=con)

has a different behavior. No matter how many times I run the last line, memory usage never peaks. Instead it keeps increasing, by increments larger than the size of the data (around 500MB), eventually leading to OOM. To me, this hints at a memory leak problem, but I don't know why this would happen only in a package. FWIW, I am running this on an AWS EC2 instance, and the issue also happens when using PostgreSQL (although the memory explosion there seems much more pronounced). Any help appreciated !

Version info:

r$> DBI::dbGetInfo(RPostgres::Postgres())
$driver.version
[1] ‘1.4.6’

$client.version
[1] ‘14.10’
krlmlr commented 5 months ago

Thanks. Minor nitpick: RPostgres::Redshift() is preferred over RPostgres::Postgres() to connect to the database, but I don't think this will change the outcome here.

Do you have a way of reproducing this on a toy RedShift instance where you could share credentials?

agilly commented 5 months ago

I didn't know about RPostgres::Redshift(), will update.

Unfortunately, I am not familiar at all with Redshift as the database was provided to me "as is" so I don't really know how to spin up such a db.

I did do a comparison with a MySQL database where this did not occur.

krlmlr commented 5 months ago

What is the type of the columns in the result set?