mymarilyn / clickhouse-driver

ClickHouse Python Driver with native interface support
https://clickhouse-driver.readthedocs.io
Other
1.2k stars 214 forks source link

Using DBUtils and clickhouse-driver can not create the connection pool correctly #241

Open bearAss13 opened 3 years ago

bearAss13 commented 3 years ago

Describe the bug I use the DBUtils package to create a connection pool for clickhouse, but by monitoring the TCP connection on the clickhouse server, I found that taking out a the connection from connection pool will create a new TCP connection.

To Reproduce I created the connection pool like this(The ip I use is 10.8.0.50,ClickHouse-server host is 192.168.1.193 and TCP port is 9002, )

from DBUtils.PooledDB import PooledDB
from clickhouse_driver import connect
pool=PooledDB(creator=connect,blocking=True,maxconnections=1,mincached=1,host='192.168.1.193',user='xxx',password='xxxx',database='default',port=9002)

the TCP connection status on ClickHouse server : tcp LISTEN 0 64 *:9002 *:* I create a cursor

conn=pool.connection()
cur=conn.cursor()

the TCP connection status on ClickHouse server : tcp LISTEN 0 64 *:9002 *:* When I execute a sql cur.execute('select 1') the TCP connection status on ClickHouse server :

tcp    LISTEN     0      64        *:9002                  *:*                  
tcp    ESTAB      0      0      192.168.1.193:9002               10.8.0.50:62960

Only when the sql statement is executed, the tcp connection is really created I close the cursor and connection

conn.close()
cur.close()

the TCP connection status on ClickHouse server : tcp LISTEN 0 64 *:9002 *:*
When I take out the connection from the connection pool again, create the cursor and execute the sql

conn=pool.connection()
cur=conn.cursor()
cur.execute('select 1')

the TCP connection status on ClickHouse server :

tcp    LISTEN     0      64        *:9002                  *:*                  
tcp    ESTAB      0      0      192.168.1.193:9002               10.8.0.50:63520 

I found that the TCP connection port has changed

Expected behavior When I try to use DBUtils and pymysql driver to create a mysql connection pool, this will not happen

Versions

xzkostyan commented 3 years ago

Well, I don't know how PooledDB works, but I guess when you call connection's .close() method the connection should be closed. Could you provide full working pymysql snippet and full not-working snippet clickhouse?

There are few limitations for this DB API. Here is a little story about it:

Currently there is no such abstraction as "cursor" in ClickHouse native protocol. There is "connection" between the client and the server and the sequence of queries than client (actually it's connection) sends to server. Each query must be finished (e. g. client must read whole socket data from ClickHouse server) before we can send another. This also means that one physical connection to ClickHouse server can't be shared between multiple threads.

We must take these facts into consideration in DB API design. That's why DB API cursor under the hood is pure clickhouse_driver.Client instance. And the connection is just a container for list of clients. And this is a reason why network connection is actually established on the first query execution.

These limitations make this DB API not so "classic" as pymysql or psycopg, but provides DB API compatible interface for various integrations.

May be this note answers why this package produces undesired behavior.

Feel free to discuss the subject.

bearAss13 commented 3 years ago

The close() method I understand in the connection pool is to put the created connections back into the pool instead of creating the TCP link again. Here is the process of creating a connection pool through pymysql (the ip I use is 10.8.0.82, mysql-server host is 192.168.1.45 and port is 3306):

from DBUtils.PooledDB import PooledDB
import pymysql

the TCP connection status on mysql server: tcp LISTEN 0 80 192.168.1.45:3306 0.0.0.0:* When initialize the connection pool (Parameter "mincached" in connection pool means "the initial number of idle connections in the pool") pool=PooledDB(creator=pymysql,blocking=True,maxconnections=1,mincached=1,host='192.168.1.45',user='xxx',password='xxx',port=3306) client and server will immediately establish TCP links:

tcp  LISTEN 0      80                                  192.168.1.45:3306                                          0.0.0.0:*                                                                               
tcp  ESTAB  0      0                                   192.168.1.45:3306                                        10.8.0.82:55729                           

Take out the connection from the connection pool, create a cursor and execute SQL

conn=pool.connection()
cur=connection()
cur.execute('select 1')

The connection will not changed

tcp  LISTEN 0      80                                  192.168.1.45:3306                                          0.0.0.0:*                                                                               
tcp  ESTAB  0      0                                   192.168.1.45:3306                                        10.8.0.82:55729                           

Call the close method to put the connection back into the connection pool.

cur.close()
conn.close()

The connection will not be closed

tcp  LISTEN 0      80                                  192.168.1.45:3306                                          0.0.0.0:*                                                                               
tcp  ESTAB  0      0                                   192.168.1.45:3306                                        10.8.0.82:55729                           

Take out the connection from the connection pool again and execute SQL.

conn=pool.connection()
cur=connection()
cur.execute('select 1')

The connection will not change

tcp  LISTEN 0      80                                  192.168.1.45:3306                                          0.0.0.0:*                                                                               
tcp  ESTAB  0      0                                   192.168.1.45:3306                                        10.8.0.82:55729                           

Only when the connection pool calls the close method

cur.close()
conn.close()
pool.close()

The TCP connection closed tcp LISTEN 0 80 192.168.1.45:3306 0.0.0.0:*