risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.03k stars 577 forks source link

support connector-x read_sql #7793

Open tempbottle opened 1 year ago

tempbottle commented 1 year ago

Is your feature request related to a problem? Please describe.

I want to read MV from risingwave and join it with some data from pandas. Code like this:

import connectorx as cx
db1 = "postgresql://root:@127.0.0.1:4566/dev"
sql = """
SELECT * FROM mv_avg_speed;
"""
data = cx.read_sql(db1, sql)
# do data join pandas/polars dataframe which is read from other DBMS

this code got error, on ./risedev playground console, error is

ERROR risingwave_frontend::session: failed to parse sql:
COPY (
SELECT * FROM mv_avg_speed;
) TO STDOUT WITH BINARY:
sql parser error: Expected identifier, found: (

the COPY stmt may not support in risingwave.

Describe the solution you'd like

support COPY stmt and return data as postgresql would be great!

Describe alternatives you've considered

psycopg2 works fine, but I want to use connector-x to make things faster.

Additional context

No response

xxchan commented 1 year ago

I found connector-x support specifying protocols:

cursor: Conventional wire protocol (slowest one), recommend to use only when binary and csv is not supported by the source (e.g. Redshift).

Could you try if that works? We may also investigate whether we can support COPY TO later :)

tempbottle commented 1 year ago

this code can not work:

sql = """
SELECT * FROM mv_avg_speed;
"""
db1 = "postgresql://root:@127.0.0.1:4566/dev"
data = cx.read_sql(db1, query=sql, protocol="cursor", return_type="polars")
print(data)

as connectorx emit error:

python3.11/site-packages/connectorx/__init__.py", line 330, in reconstruct_arrow 
   raise ValueError("Empty result")

maybe the response of risingwave miss something here, or connectorx can not parse the response.

neverchanje commented 1 year ago

Hi tempbottle, Thanks for raising this issue. This requirement looks very interesting!

From our end, supporting COPY TO STDOUT is not difficult. What could bring a little complexity is supporting Postgres's COPY protocol: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY I've created an issue for this task: https://github.com/risingwavelabs/risingwave/issues/7983. We can prioritize development if it really blocks your adoption of RisingWave.

tempbottle commented 1 year ago

That will be great! work with connector-x is rusty and fast.

jon-chuang commented 1 year ago

maybe the response of risingwave miss something here, or connectorx can not parse the respons

I wonder if this is due to some missing features in our binary PSQL response. https://github.com/risingwavelabs/risingwave/issues/8213