trinodb / trino-python-client

Python client for Trino
Apache License 2.0
327 stars 163 forks source link

benchmark trino client's speed of retrieving data, it seems the bottleneck of the data pipeline #404

Open zeddit opened 1 year ago

zeddit commented 1 year ago

Expected behavior

when using simple select * from db, the speed should not be less than the original database's speed, otherwise the system overall will be delayed by trino itself.

for example, when getting data directly from database with sqlalchemy, the speed would reach 100MB/s, while when trino is getting in, the speed overall decreases to only 10MB/s.

Actual behavior

the speed of trino should be no less than the database one.

Steps To Reproduce

I have tested the bottleneck of python client.

I used a memory connector which means the data is reside in the trino itself, the time records only for data getting out of the trino and get in to the client.

however, this bottleneck is only about 10-20MB/s, while my backend database could get about 100MB/s in a single connection.

Log output

截屏2023-08-24 13 42 05

Operating System

ubuntu 20.04

Trino Python client version

lastest

Trino Server version

lastest

Python version

3.10

Are you willing to submit PR?