snowflakedb / snowflake-connector-python

Snowflake Connector for Python
https://pypi.python.org/pypi/snowflake-connector-python/
Apache License 2.0
568 stars 458 forks source link

SNOW-1239684: Snowflake connector telemetry is toggled remotely and always sends internal data to Snowflake #1902

Open dbold opened 4 months ago

dbold commented 4 months ago

Python version

Python 3.11.6 (main, Jan 9 2024, 11:01:12) [GCC 11.4.0]

Operating system and processor architecture

Linux-5.15.0-100-generic-x86_64-with-glibc2.35

Installed packages

aiohttp==3.9.3
aiosignal==1.3.1
asn1crypto==1.5.1
attrs==23.2.0
certifi==2023.11.17
cffi==1.16.0
charset-normalizer==3.3.2
cryptography==42.0.5
filelock==3.13.1
frozenlist==1.4.1
greenlet==3.0.3
idna==3.6
influxdb-client==1.39.0
multidict==6.0.5
numpy==1.26.4
packaging==23.2
pandas==2.2.1
platformdirs==3.11.0
pycparser==2.21
PyJWT==2.8.0
pyOpenSSL==24.0.0
python-dateutil==2.8.2
pytz==2024.1
reactivex==4.0.4
requests==2.31.0
six==1.16.0
snowflake-connector-python==3.7.1
snowflake-sqlalchemy==1.5.1
sortedcontainers==2.4.0
SQLAlchemy==1.4.51
tomlkit==0.12.4
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.1.0
yarl==1.9.4

What did you do?

It looked from our audits as if the Snowflake Connector starts with a disabled telemetry and there is a way to progranmatically toggle it, if one so desires.

To our great surprise the Snowflake Connector exfiltrates data and does send telemetry no matter what.

This happens because the telemetry parameter is enabled remotely by Snowflake server.

This is done early, during the authentication: the session_parameters are updated based on the server response https://github.com/snowflakedb/snowflake-connector-python/blob/main/src/snowflake/connector/auth/_auth.py#L470

The server response contains among others in data.parameters the telemetry keys:

{'name': 'CLIENT_TELEMETRY_ENABLED', 'value': True},
{'name': 'CLIENT_TELEMETRY_SESSIONLESS_ENABLED', 'value': True},

With the updates session_parameters, Auth calls self._rest._connection._update_parameters(session_parameters) which will update telemetry_enabled = True one the connection.

Furthermore, since the connection calls _log_telemetry_imported_packages which means at least a log even (with all the packages) is saved (in the buffer) even before the connection is done.

Interestingly, the list of imported packages is a rather intrusive log to send.

And, at the end, just closing the client will flush telemetry and send the data externally.

Example:

import snowflake.connector

with snowflake.connector.connect(
    user='x',
    password='y',
    account='z',
    warehouse='i',
    database='d',
    validate_default_parameters=True
    ) as c:
    c.telemetry_enabled = False

At the end of this short program we see telemetry got enabled (through the server reply) and that data was sent.

2024-03-15 22:59:19,823 - MainThread connection.py:734 - close() - INFO - closed
2024-03-15 22:59:19,823 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-03-15 22:59:19,823 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value':...

What did you expect to see?

We did not expect to see CLIENT_TELEMETRY_ENABLED being set based on a server reply.

If this is a user settings, we would like to see where to configure it for our account.

But, as a library, it makes little sense for the Snowflake connector to just take all session_parameters as-is from the server reply.

The telemetry parameters should be explicitly excluded.

Fundamentally, there should be an implicit or easy way for no telemetry to be ever sent.

Can you set logging to DEBUG and collect the logs?

Relevant logs:

2024-03-15 22:59:19,823 - MainThread connection.py:734 - close() - INFO - closed
2024-03-15 22:59:19,823 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-03-15 22:59:19,823 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value
sfc-gh-yixie commented 3 months ago

@dbold could you try this?

from snowflake.connector.telemetry_oob import TelemetryService
conn = snowflake.connector.connect(**CONNECTION_PARAMETERS)
# disable in-band telemetry
conn.telemetry_enabled = False
# disable out-of-band telemetry
TelemetryService.get_instance().disable()
dbold commented 3 months ago

This does not seem to work:

with snowflake.connector.connect(...) as c:
    c.telemetry_enabled = False

    from snowflake.connector.telemetry_oob import TelemetryService
    # disable out-of-band telemetry
    TelemetryService.get_instance().disable()

as the logs show telemetry is sent:

2024-04-03 08:05:51,308 - MainThread connection.py:734 - close() - INFO - closed
2024-04-03 08:05:51,308 - MainThread telemetry.py:211 - close() - DEBUG - Closing telemetry client.
2024-04-03 08:05:51,309 - MainThread telemetry.py:176 - send_batch() - DEBUG - Sending 1 logs to telemetry. Data is {'logs': [{'message': {'driver_type': 'PythonConnector', 'driver_version': '3.7.1', 'source': 'PythonConnector', 'type': 'client_imported_packages', 'value': "{'ntpath', 'random', 'ctypes', 'itertools', 'quopri', 'asn1crypto', 'opcode', 'builtins', 'hashlib', 'logging', 'certifi', 'platform', 'inspect', 'enum'....

An attempt to disable telemetry we explored is just changing the telemetry URL:

# Try to break the telemetry client with a wrong url. It will auto-disable itself after sending the 1st packet and failing.
snowflake.connector.telemetry.TelemetryClient.SF_PATH_TELEMETRY = "/please-stop/sending"

But this seems to cause some other problems and I'll probably open a separate issue.

sfc-gh-yixie commented 3 months ago

@dbold We're reviewing what's next for telemetry. Will update you later.