oracle / python-oracledb

Python driver for Oracle Database conforming to the Python DB API 2.0 specification. This is the renamed, new major release of cx_Oracle
https://oracle.github.io/python-oracledb
Other
307 stars 59 forks source link

oracledb memory problem #290

Closed Looongtail closed 2 months ago

Looongtail commented 5 months ago

Discussed in https://github.com/oracle/python-oracledb/discussions/286

Originally posted by **Looongtail** January 26, 2024 I tried to get data from oracle database and convert it to dataframe. I've noticed that oracledb's cursor seems consume lot of memory and never get it back. This is my env. info. Oracle version : 19c oracle client version : 19.20 oracledb lib version : 2.1 Table create query for testing ``` CREATE TABLE test_table ( ID NUMBER, RandomColumn1 VARCHAR2(50), RandomColumn2 NUMBER, RandomColumn3 DATE ); DECLARE v_counter NUMBER := 1; BEGIN WHILE v_counter <= 10000000 LOOP INSERT INTO test_table (ID, RandomColumn1, RandomColumn2, RandomColumn3) VALUES ( v_counter, DBMS_RANDOM.STRING('X', 10), ROUND(DBMS_RANDOM.VALUE(-100, 100)), TRUNC(SYSDATE - DBMS_RANDOM.VALUE(1, 365)) ); v_counter := v_counter + 1; END LOOP; COMMIT; END; / ``` Python code to check memory usage ``` import oracledb import pandas as pd import psutil def memory_usage(message: str = 'debug'): p = psutil.Process() rss = p.memory_info().rss / 2 ** 20 # byte to MB print(f'{message} memory usage {rss: 10.5f} MB') memory_usage('init') oracledb.init_oracle_client() conn = oracledb.connect(user='*******', password='*******', dsn='*******') cursor = conn.cursor() cursor.arraysize = 10000 sql = 'select * from test_table' data_lt = cursor.execute(sql).fetchall()memory_usage('data fetch') columns_lt = [column[0] for column in cursor.description] data_df = pd.DataFrame(data_lt, columns=columns_lt) memory_usage('make dataframe') cursor.close() conn.close() memory_usage('close connection') print(f'dataframe size : {data_df.info()}') ``` Result C:\Users\SDS\AppData\Local\Programs\Python\Python310\python.exe D:\workspace\nscm_pre\z_local_run\test_oracle_memory.py init memory usage 88.41797 MB data fetch memory usage 2475.40625 MB make dataframe memory usage 2781.08594 MB close connection memory usage 2772.56641 MB RangeIndex: 10000000 entries, 0 to 9999999 Data columns (total 4 columns): ;# Column Dtype --- ------ ----- 0 ID int64 1 RANDOMCOLUMN1 object 2 RANDOMCOLUMN2 int64 3 RANDOMCOLUMN3 datetime64[ns] dtypes: datetime64[ns](1), int64(2), object(1) memory usage: 305.2+ MB dataframe size : None Process finished with exit code 0 I'd like to make used memory free after making dataframe. When I save it as a file and delete it, memory is released. In addition, memory usage much smaller than before when it's loaded from a file. Could somebody help?
anthony-tuininga commented 5 months ago

I tried just performing the query without pandas involved and found that after the connection is closed and the data destroyed that very little memory is retained. Even just importing pandas without actually using it involved an additional 60 MB. So I'm not sure there is something that needs to be done for oracledb.

Note that you have two copies of the data: the first being the result of the fetch and the second the data frame. Neither of those have been destroyed in your code so it is unsurprising that the amount of memory has increased enormously!

cjbj commented 2 months ago

Closing - no activity.