snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
586 stars 468 forks source link

SNOW-523131: Possible memory leak in SnowflakeConnection #1018

Closed dapollak closed 2 years ago

dapollak commented 2 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    3.8.5

  2. What operating system and processor architecture are you using?

    macOS-10.16-x86_64-i386-64bit

  3. What are the component versions in the environment (pip freeze)?

alembic==1.6.5
anyio==3.3.0
appnope==0.1.2
asgiref==3.4.1
asn1crypto==1.4.0
astroid==2.6.5
async-generator==1.10
async-timeout==3.0.1
attrs==21.2.0
Automat==20.2.0
azure-common==1.1.27
azure-core==1.16.0
azure-storage-blob==12.8.1
backcall==0.2.0
backoff==1.10.0
beautifulsoup4==4.9.3
boto3==1.18.6
botocore==1.21.6
CacheControl==0.12.6
cachetools==4.2.2
certifi==2020.12.5
cffi==1.14.6
chardet==3.0.4
charset-normalizer==2.0.3
click==7.1.2
clickclick==20.10.2
cloudflare==2.8.15
connexion==2.7.0
constantly==15.1.0
croniter==1.0.15
cryptography==3.2.1
curlify==2.2.1
cycler==0.11.0
decorator==5.0.9
Deprecated==1.2.12
elastic-apm==6.3.3
fastapi==0.67.0
firebase-admin==5.0.1
Flask==2.0.1
Flask-Injector==0.12.3
fonttools==4.28.5
google-api-core==1.26.0
google-api-python-client==1.9.3
google-auth==1.33.1
google-auth-httplib2==0.1.0
google-cloud-core==1.7.1
google-cloud-datastore==2.1.6
google-cloud-firestore==2.2.0
google-cloud-iam==2.2.0
google-cloud-kms==1.4.0
google-cloud-pubsub==2.6.1
google-cloud-scheduler==2.2.0
google-cloud-secret-manager==1.0.0
google-cloud-storage==1.37.1
google-cloud-tasks==1.5.0
google-crc32c==1.1.5
google-resumable-media==1.3.3
googleapis-common-protos==1.53.0
graphql-core==3.1.5
graphviz==0.19.1
greenlet==1.1.0
grpc-google-iam-v1==0.12.3
grpcio==1.39.0
gunicorn==20.0.4
h11==0.12.0
httplib2==0.19.1
httptools==0.2.0
hyperlink==21.0.0
hypothesis==5.49.0
hypothesis-graphql==0.5.0
hypothesis-jsonschema==0.20.1
idna==2.8
importlib-metadata==4.6.1
incremental==21.3.0
inflection==0.5.1
iniconfig==1.1.1
injector==0.18.4
ipython==7.27.0
isodate==0.6.0
isort==5.9.2
itsdangerous==2.0.1
jedi==0.18.0
Jinja2==3.0.1
jmespath==0.10.0
jsonlines==2.0.0
jsonpatch==1.32
jsonpointer==2.1
jsonschema==3.2.0
junit-xml==1.9
jwcrypto==0.9.1
keyring==23.0.1
kiwisolver==1.3.2
lazy-object-proxy==1.6.0
libcst==0.3.19
Mako==1.1.4
MarkupSafe==2.0.1
matplotlib==3.5.1
matplotlib-inline==0.1.3
mccabe==0.6.1
memory-profiler==0.60.0
msgpack==1.0.2
msrest==0.6.21
multidict==5.1.0
mypy-extensions==0.4.3
ndg-httpsclient==0.5.1
numpy==1.21.5
oauthlib==3.1.1
objgraph==3.5.0
openapi-schema-validator==0.1.5
openapi-spec-validator==0.3.1
opentelemetry-api==1.4.0
opentelemetry-exporter-otlp==1.4.0
opentelemetry-exporter-otlp-proto-grpc==1.4.0
opentelemetry-instrumentation==0.23b0
opentelemetry-instrumentation-asgi==0.23b0
opentelemetry-instrumentation-fastapi==0.23b0
opentelemetry-instrumentation-flask==0.23b0
opentelemetry-instrumentation-logging==0.23b0
opentelemetry-instrumentation-requests==0.23b0
opentelemetry-instrumentation-sqlalchemy==0.23b0
opentelemetry-instrumentation-wsgi==0.23b0
opentelemetry-proto==1.4.0
opentelemetry-sdk==1.4.0
opentelemetry-semantic-conventions==0.23b0
opentelemetry-util-http==0.23b0
oscrypto==1.2.1
outcome==1.1.0
packaging==21.0
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.4.0
pluggy==0.13.1
prompt-toolkit==3.0.20
proto-plus==1.19.0
protobuf==3.17.3
psutil==5.8.0
ptyprocess==0.7.0
pusher==3.0.0
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pycryptodomex==3.10.1
pydantic==1.8.2
Pygments==2.10.0
PyJWT==2.1.0
pylint==2.9.5
PyNaCl==1.4.0
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyrsistent==0.18.0
pytest==6.2.4
pytest-asyncio==0.15.1
pytest-subtests==0.5.0
pytest-tornasync==0.6.0.post2
pytest-trio==0.7.0
pytest-twisted==1.13.3
python-dateutil==2.8.2
python-dotenv==0.19.0
python-editor==1.0.4
python-json-logger==2.0.1
python-jwt==3.3.0
pytz==2021.1
PyYAML==5.4.1
requests==2.26.0
requests-oauthlib==1.3.0
rsa==4.7.2
s3transfer==0.5.0
schemathesis==3.0.0
six==1.16.0
sniffio==1.2.0
snowflake-connector-python==2.5.1
snowflake-sqlalchemy==1.2.4
sortedcontainers==2.4.0
soupsieve==2.2.1
SQLAlchemy==1.3.23
SQLAlchemy-Utils==0.37.8
starlette==0.13.8
starlette-context==0.3.3
toml==0.10.2
tornado==6.1
traitlets==5.1.0
trio==0.19.0
Twisted==21.2.0
typing-extensions==3.10.0.0
typing-inspect==0.7.1
uritemplate==3.0.1
urllib3==1.26.5
uvicorn==0.14.0
uvloop==0.15.3
watchgod==0.7
wcwidth==0.2.5
websockets==9.1
Werkzeug==2.0.1
wrapt==1.12.1
xdot==1.2
yarl==1.6.3
zipp==3.5.0
zope.interface==5.4.0

Hey I have a flask server, which communicates Snowflake database through sqlalchemy & snowflake connector. Recently I'm seeing that I have a memory leak (memory increases constantly after every http request that does database operation). After digging in, i found that SnowflakeConnection object is not being released, and a new one is created every other request.

I looked on SnowflakeConnection code, and saw that:

  1. in SnowflakeConnection.__init__ (https://github.com/snowflakedb/snowflake-connector-python/blob/1659ec6b78930d1f947b4eff985c891af614d86c/src/snowflake/connector/connection.py#L286), you assign a SnowflakeRestful object to self._telemetry._rest and self.incident._rest.
  2. The SnowflakeRestful points back to self (SnowflakeConnection)
  3. in self.close(), we don't put None in self._telemetry._rest and self.incident._rest, so there are circular references left in the SnowflakeConnection object which make it stay in memory.

It looks like when I break this circularity manually, The object is being released successfully. Am I missing something, or it's really a bug ?

Thanks !

sfc-gh-mkeller commented 2 years ago

Hi @dapollak If I recall how close and the different classes work then I do think that this is a bug. Why doesn't Python's garbage collector not find the circular references though?

dapollak commented 2 years ago

@sfc-gh-mkeller, thanks for the quick response !

As I understand, the gc doesn't release objects involved in a circular referencing if they have __del__ method.

sfc-gh-mkeller commented 2 years ago

@dapollak I tried reproducing your issue, on top of master I applied this small patch that adds freeing print lines:

diff --git a/src/snowflake/connector/connection.py b/src/snowflake/connector/connection.py
index 15dff139..56a6067f 100644
--- a/src/snowflake/connector/connection.py
+++ b/src/snowflake/connector/connection.py
@@ -287,6 +287,7 @@ class SnowflakeConnection(object):
         self.incident = IncidentAPI(self._rest)

     def __del__(self):  # pragma: no cover
+        print(f"freeing {self}")
         try:
             self.close(retry=False)
         except Exception:
diff --git a/src/snowflake/connector/incident.py b/src/snowflake/connector/incident.py
index a800a273..38837579 100644
--- a/src/snowflake/connector/incident.py
+++ b/src/snowflake/connector/incident.py
@@ -108,6 +108,9 @@ class IncidentAPI(object):
     def __init__(self, rest):
         self._rest = rest

+    def __del__(self):
+        print(f"freeing {self}")
+
     def report_incident(
         self, incident=None, job_id=None, request_id=None, session_parameters=None
     ):
diff --git a/src/snowflake/connector/telemetry.py b/src/snowflake/connector/telemetry.py
index 5d833f96..c4172545 100644
--- a/src/snowflake/connector/telemetry.py
+++ b/src/snowflake/connector/telemetry.py
@@ -73,6 +73,9 @@ class TelemetryClient(object):
         self._lock = Lock()
         self._enabled = True

+    def __del__(self):
+        print(f"freeing {self}")
+
     def add_log_to_batch(self, telemetry_data: "TelemetryData") -> None:
         if self._is_closed:
             raise Exception("Attempted to add log when TelemetryClient is closed")

Then I ran the following script:

import gc
import snowflake.connector

conn = snowflake.connector.connect(
        ...
)
# conn._telemetry._rest = None
del conn
num = gc.collect()
print(num)

In this case gc.collect correctly finds all 3 objects and frees them all. Is it possible that you've disabled reference cycle detection? I'll try to dig deeper into this in the meantime

sfc-gh-mkeller commented 2 years ago

As I understand, the gc doesn't release objects involved in a circular referencing if they have __del__ method.

From what I could find, this used to be true but isn't since PEP-442

dapollak commented 2 years ago

@sfc-gh-mkeller I see. Another thing I noticed that may prevent the Connection object from being release is the HeartBeatTimer. in SnowflakeConnection, you create timer (inside self._add_heartbeat) which its thread is a method of SnowflakeConnection:

    self.heartbeat_thread = HeartBeatTimer(
                self.client_session_keep_alive_heartbeat_frequency, self._heartbeat_tick
            )

If the timer isn't close properly, the reference to self._heartbeat_tick will prevent the gc of releasing the connection.

In HeartBeatTimer under time_util.py you override the self.run method of the built-in timer:

    def run(self) -> None:
        while not self.finished.is_set():
            self.finished.wait(self.interval)
            if not self.finished.is_set():
                try:
                    self.function()
                except Exception as e:
                    logger.debug("failed to heartbeat: %s", e)

but it seems you don't call self.finished.set() like in python's built-in Timer, which its self.run method looks like -

    def run(self):
        self.finished.wait(self.interval)
        if not self.finished.is_set():
            self.function(*self.args, **self.kwargs)
        self.finished.set()

Try to create the connection with self.client_session_keep_alive set, i guess it's created without it in your reproduce example (and correct me otherwise)

sfc-gh-mkeller commented 2 years ago

Okay, that's it. The following script reproduces the issue:

import gc
import snowflake.connector

conn = snowflake.connector.connect(
    ...
    client_session_keep_alive=True,
)
print(f"gonna delete {conn}")
del conn
print("going to force garbage collection")
num = gc.collect()

I'm going to fix this issue

sfc-gh-mkeller commented 2 years ago

@dapollak could you please see if 98eb923 (#1031) fixes your issue, it fixes my reproducer script so hopefully it'll work for you as well

dapollak commented 2 years ago

@sfc-gh-mkeller fixes for me either. Good job !