Closed agilly closed 3 months ago
Thanks. Minor nitpick: RPostgres::Redshift()
is preferred over RPostgres::Postgres()
to connect to the database, but I don't think this will change the outcome here.
Do you have a way of reproducing this on a toy RedShift instance where you could share credentials?
I didn't know about RPostgres::Redshift()
, will update.
Unfortunately, I am not familiar at all with Redshift as the database was provided to me "as is" so I don't really know how to spin up such a db.
I did do a comparison with a MySQL database where this did not occur.
What is the type of the columns in the result set?
I am connecting to a Redshift Serverless deployment using RPostgres. Let's call that connection
con
:I am trying to read a table with a where statement. In my case the result is about 144MB:
If I run the following command:
a few times, I see memory usage jump sharply on the first connection (around 1.5GB), then it keeps increasing moderately as I keep rerunning that statement, until stabilizing at ~2GB total RAM.
However, if I wrap the call in a function:
and put it in a package, the following:
has a different behavior. No matter how many times I run the last line, memory usage never peaks. Instead it keeps increasing, by increments larger than the size of the data (around 500MB), eventually leading to OOM. To me, this hints at a memory leak problem, but I don't know why this would happen only in a package. FWIW, I am running this on an AWS EC2 instance, and the issue also happens when using
PostgreSQL
(although the memory explosion there seems much more pronounced). Any help appreciated !Version info: