Open leowu4ever opened 2 years ago
Hi @leowu4ever , it would be more helpful if you can provide more information (e.g. which database your are using) so we can reproduce the error. You can checkout our bug report template here for more information.
Python 3.9.12
0.3
MySQL (Singlestore)
pandas
Running a query which tries to pull an entire table which has around 2m rows. It runs for 10s, then terminates and gives the error. The query runs successfully when adding 'LIMIT X'.
If possible, please include a minimal simple example including:
Table schema and example data
cx.read_sql(url, query, partition_num=10)
File ~/opt/anaconda3/lib/python3.9/site-packages/connectorx/init.py:224, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col) 221 except ModuleNotFoundError: 222 raise ValueError("You need to install pandas first") --> 224 result = _read_sql( 225 conn, 226 "pandas", 227 queries=queries, 228 protocol=protocol, 229 partition_query=partition_query, 230 ) 231 df = reconstruct_pandas(result) 233 if index_col is not None:
RuntimeError: CodecError { IO error: `bytes remaining on stream' }
Hi @wangxiaoying, thank you for getting back to me on this issue. I have added some information requested. Thank you.
Hi @leowu4ever , thanks for the info. The error seems caused by the underlying tokio
crate.
Running a query which tries to pull an entire table which has around 2m rows. It runs for 10s, then terminates and gives the error. The query runs successfully when adding 'LIMIT X'.
That's weird since we did benchmarking on mysql with TPCH (SF=10, ~6M rows) data and it worked fine. So I'm not quite sure what's going on here. Does the query run successfully even when X is very large like 2M (but still smaller or equal to the total number of rows of the query result)? If it is, is your data updating?
I get almost the same error with a slightly different setup: The error is:
RuntimeError: error communicating with the server: bytes remaining on stream
and the libraries are:
Postgres 15
Polars 0.19.1
Python 3.9.16
connectorx==0.3.2a7
for me the error also only starts occurring when I pull in a large amount of rows. about 16 million. With a couple million it's fine.
Hi community, I got this error where the command tries to pull an entire table which has around 2m rows. Connectorx was able to handle it nicely but suddenly it starts to give me the error below. Can anyone take a look at this? Thanks.
File ~/opt/anaconda3/lib/python3.9/site-packages/connectorx/init.py:224, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col) 221 except ModuleNotFoundError: 222 raise ValueError("You need to install pandas first") --> 224 result = _read_sql( 225 conn, 226 "pandas", 227 queries=queries, 228 protocol=protocol, 229 partition_query=partition_query, 230 ) 231 df = reconstruct_pandas(result) 233 if index_col is not None:
RuntimeError: CodecError { IO error: `bytes remaining on stream' }