open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
733 stars 606 forks source link

`sqlalchemy` memory leak with Engine cleanup introduced in 0.37b0 #1761

Closed adamgregory closed 1 year ago

adamgregory commented 1 year ago

Describe your environment python 3.9.7

Deprecated==1.2.13
graphviz==0.20.1
importlib-metadata==6.0.1
mysqlclient==1.4.4
objgraph==3.5.0
opentelemetry-api==1.16.0
opentelemetry-instrumentation==0.37b0
opentelemetry-instrumentation-sqlalchemy==0.37b0
opentelemetry-sdk==1.16.0
opentelemetry-semantic-conventions==0.37b0
packaging==23.1
SQLAlchemy==1.3.24
typing_extensions==4.5.0
wrapt==1.15.0
zipp==3.15.0

Steps to reproduce

from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
SQLAlchemyInstrumentor().instrument()
from sqlalchemy import create_engine
import objgraph
import gc

def leak_memory():
    engine = create_engine('mysql://root@127.0.0.1:3306')
    with engine.connect() as conn:
        rs = conn.execute('SELECT 1')
        for row in rs:
           print(row)
    engine.dispose()

gc.collect()
snapshot = objgraph.growth()
leak_memory()
gc.collect()
snapshot2 = objgraph.growth(limit=None)
print(snapshot2)

for x in range(0,10):
   leak_memory()

gc.collect()
snapshot3 = objgraph.growth(limit=None)
print(snapshot3)

output:

(1,)
[('function', 8357, 438), ('dict', 4705, 339), ('tuple', 4055, 302), ('weakref', 2024, 168), ('cell', 1223, 107), ('deque', 67, 58), ('type', 958, 53), ('_EmptyListener', 50, 50), ('method_descriptor', 1082, 45), ('VisitableType', 245, 43), ('module', 355, 34), ('ModuleSpec', 353, 34), ('SourceFileLoader', 289, 32), ('attrgetter', 157, 29), ('getset_descriptor', 1005, 27), ('member_descriptor', 530, 27), ('list', 1821, 25), ('builtin_function_or_method', 1007, 22), ('set', 303, 20), ('classmethod', 239, 12), ('property', 448, 11), ('wrapper_descriptor', 1354, 10), ('PlaceHolder', 14, 8), ('memoized_property', 93, 7), ('DBAPISet', 7, 7), ('FileFinder', 43, 5), ('_ListenerCollection', 5, 5), ('Logger', 26, 4), ('ExtensionFileLoader', 31, 2), ('Condition', 3, 2), ('Context', 4, 2), ('frozenset', 166, 1), ('ABCMeta', 129, 1), ('_abc_data', 138, 1), ('method', 88, 1), ('RegexFlag', 15, 1), ('WeakKeyDictionary', 46, 1), ('NullType', 2, 1), ('EnsureKWArgType', 4, 1), ('PoolEventsDispatch', 2, 1), ('ConnectionEventsDispatch', 2, 1), ('ProxyTracer', 2, 1), ('NoOpTracer', 2, 1), ('ContextVarsRuntimeContext', 1, 1), ('hamt', 1, 1), ('hamt_bitmap_node', 1, 1), ('EngineTracer', 1, 1), ('ContextVar', 1, 1), ('Engine', 1, 1), ('QueuePool', 1, 1), ('URL', 1, 1), ('MySQLDialect_mysqldb', 1, 1), ('_local', 1, 1), ('Queue', 1, 1), ('MySQLIdentifierPreparer_mysqldb', 1, 1), ('MySQLTypeCompiler', 1, 1)]
(1,)
(1,)
(1,)
(1,)
(1,)
(1,)
(1,)
(1,)
(1,)
(1,)
[('dict', 4966, 261), ('builtin_function_or_method', 1127, 120), ('weakref', 2144, 120), ('tuple', 4145, 90), ('cell', 1313, 90), ('deque', 147, 80), ('function', 8417, 60), ('set', 363, 60), ('_ListenerCollection', 55, 50), ('list', 1853, 32), ('Condition', 23, 20), ('method', 98, 10), ('WeakKeyDictionary', 56, 10), ('NullType', 12, 10), ('PoolEventsDispatch', 12, 10), ('ConnectionEventsDispatch', 12, 10), ('ProxyTracer', 12, 10), ('NoOpTracer', 12, 10), ('EngineTracer', 11, 10), ('Engine', 11, 10), ('QueuePool', 11, 10), ('URL', 11, 10), ('MySQLDialect_mysqldb', 11, 10), ('_local', 11, 10), ('Queue', 11, 10), ('MySQLIdentifierPreparer_mysqldb', 11, 10), ('MySQLTypeCompiler', 11, 10)]

What is the expected behavior? After fetching the data the engine is disposed to ensure all of the pooled connections are fully closed. Related objects should be garbage collected after they fall out of scope

What is the actual behavior? Despite manually invoking the garbage collector interface, one instance of the sqlalchemy or sqlalchemy instrumentation related objects remains allocated in memory for each time the leak_memory function was executed. If the leak_memory function is run in an infinite loop the memory allocated to the script will grow linearly and unbounded until available memory is consumed.

Additional context This issue was introduced in 0.37b0 and is also present in 0.38b0

rbagd commented 1 year ago

If @adamgregory isn't ineterested, I would like to take this up. I noticed that this also works with sqlite:///:memory: engine, so easy to reproduce locally.

Quick check indicates it's probably related to this commit.

adamgregory commented 1 year ago

Please go ahead. I haven't been able to make much progress here beyond identifying the issue. Compared to 0.36b0 these additional listeners introduce more circular memory references between Engine and EngineTracer, but this is definitely not my area of expertise and it wasn't clear to me why the garbage collector wasn't able to detect and free these circular references.

rbagd commented 1 year ago

It looks like the gist of the issue is this line. Each new engine gets appended to cls._remove_event_listener_params via the target parameter, and so this list keeps holding references until uninstrument() is called. Need to check in sqlalchemy code if there's another alternative.