sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
2.03k stars 162 forks source link

RuntimeError: CodecError { IO error: `bytes remaining on stream' } #365

Open leowu4ever opened 2 years ago

leowu4ever commented 2 years ago

Hi community, I got this error where the command tries to pull an entire table which has around 2m rows. Connectorx was able to handle it nicely but suddenly it starts to give me the error below. Can anyone take a look at this? Thanks.

File ~/opt/anaconda3/lib/python3.9/site-packages/connectorx/init.py:224, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col) 221 except ModuleNotFoundError: 222 raise ValueError("You need to install pandas first") --> 224 result = _read_sql( 225 conn, 226 "pandas", 227 queries=queries, 228 protocol=protocol, 229 partition_query=partition_query, 230 ) 231 df = reconstruct_pandas(result) 233 if index_col is not None:

RuntimeError: CodecError { IO error: `bytes remaining on stream' }

wangxiaoying commented 2 years ago

Hi @leowu4ever , it would be more helpful if you can provide more information (e.g. which database your are using) so we can reproduce the error. You can checkout our bug report template here for more information.

leowu4ever commented 2 years ago

What language are you using?

Python 3.9.12

What version are you using?

0.3

What database are you using?

MySQL (Singlestore)

What dataframe are you using?

pandas

Can you describe your bug?

Running a query which tries to pull an entire table which has around 2m rows. It runs for 10s, then terminates and gives the error. The query runs successfully when adding 'LIMIT X'.

What are the steps to reproduce the behavior?

If possible, please include a minimal simple example including:

Database setup if the error only happens on specific data or data type

Table schema and example data

Example query / code
cx.read_sql(url, query, partition_num=10)

What is the error?

File ~/opt/anaconda3/lib/python3.9/site-packages/connectorx/init.py:224, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col) 221 except ModuleNotFoundError: 222 raise ValueError("You need to install pandas first") --> 224 result = _read_sql( 225 conn, 226 "pandas", 227 queries=queries, 228 protocol=protocol, 229 partition_query=partition_query, 230 ) 231 df = reconstruct_pandas(result) 233 if index_col is not None:

RuntimeError: CodecError { IO error: `bytes remaining on stream' }

leowu4ever commented 2 years ago

Hi @wangxiaoying, thank you for getting back to me on this issue. I have added some information requested. Thank you.

wangxiaoying commented 2 years ago

Hi @leowu4ever , thanks for the info. The error seems caused by the underlying tokio crate.

Running a query which tries to pull an entire table which has around 2m rows. It runs for 10s, then terminates and gives the error. The query runs successfully when adding 'LIMIT X'.

That's weird since we did benchmarking on mysql with TPCH (SF=10, ~6M rows) data and it worked fine. So I'm not quite sure what's going on here. Does the query run successfully even when X is very large like 2M (but still smaller or equal to the total number of rows of the query result)? If it is, is your data updating?

RmStorm commented 1 year ago

I get almost the same error with a slightly different setup: The error is:

RuntimeError: error communicating with the server: bytes remaining on stream

and the libraries are:

Postgres 15
Polars 0.19.1
Python 3.9.16
connectorx==0.3.2a7

for me the error also only starts occurring when I pull in a large amount of rows. about 16 million. With a couple million it's fine.