mymarilyn / clickhouse-driver

ClickHouse Python Driver with native interface support
https://clickhouse-driver.readthedocs.io
Other
1.21k stars 213 forks source link

DBAPI Cursor object seems to fetch all records while calling `.execute()` #457

Closed VitalyPetrov closed 1 month ago

VitalyPetrov commented 1 month ago

Describe the bug All dbapi-like drivers like psycopg2 or pyhive utilize .execute() method to submit the query and .fetchone() / .fetchall() / .fetchmany() methods to obtain the results. However, while using Connection (DBAPI-like obj) object from your connector the behaviour seems to be different. While calling .execute() method the connector will pull all the records which leads to the great RAM consumption

To Reproduce

import logging
from clickhouse_driver import connect

logger = logging.getLogger()
logger.setLevel(logging.INFO)

conn = connect(
    host='***', port=9000, user='***', password='***'
)

with conn.cursor() as cursor:
    logger.info('Query: start')
    cursor.execute('select * from <some-big-table>')
    logger.info('Query: end')

    columns = [c[0] for c in cursor.description]
    logger.info(f"Query columns: {columns}")

    while rows := cursor.fetchmany(10000):
        logger.info('Processing batch')
        pass

Expected behavior Much lesser RAM consumption and pulling each batch separately while calling cursor.fetchmany(batchsize)

Versions

- Version of package with the problem: 0.2.9 - ClickHouse server version: 24.2.2.71 - Python version: 3.12.4

VitalyPetrov commented 1 month ago

Here is a plot for RAM consumption increase over time

изображение
VitalyPetrov commented 1 month ago

Found the original explanation at one of related issues https://github.com/mymarilyn/clickhouse-driver/issues/413#issuecomment-1915449763

Closing this issue