Open miohtama opened 10 months ago
I rewrote the loop to not to do fork()
within the loop and the resource leakage is gone. However the downside for this is that as I cannot rely fork()
to pass down data from parent process to child processes, I need to serialise the data between the parent process and child processes (child cannot just read it from the forked memory) and this slows down the data ETL.
What language are you using?
Python
What version are you using?
What database are you using?
PosgreSQL.
Ubuntu Linux 22.04.
What dataframe are you using?
Arrow 2
Can you describe your bug?
I am running a loop that exports data from the database in slices.
The query I am using looks like:
The loop is using
multiprocessing
module, but this is not touching ConnectionX, so I suspect some kind of interaction between these two.After running a script for a while I get:
What are the steps to reproduce the behavior?
Run the export script that issues
read_sql
multiple times for long time.I checked using
lsof
and it seems like (nameless?) FIFO pipes are increasing with each loop.If there are ways to "reset" ConnectorX Python bindings and internals, I can see if this would help e.g. by manually purging/deleting any OS resources ConnectorX might hold.