pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.97k stars 1.93k forks source link

`read_database` fails to parse the URI with a question mark in the password #12645

Closed danielgafni closed 9 months ago

danielgafni commented 11 months ago

Checks

Reproducible example

pl.read_database("SELECT * FROM table LIMIT 1", "postgres://user:password_with_?@host:port/database")

Log output

File /usr/local/lib/python3.9/site-packages/polars/io/database.py:737, in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
    734 except BaseException as err:
    735     # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
    736     errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
--> 737     raise type(err)(errmsg) from err
    739 return from_arrow(tbl, schema_overrides=schema_overrides)

RuntimeError: parse error: invalid port number

Issue description

pl.read_database started failing after the password inside the URI string got changed. It now contains a question mark symbol - ?. sqlalchemy is able to use this string correctly, but polars fails to parse it.

Expected behavior

Question marks should not break URI parsing

Installed versions

``` --------Version info--------- Polars: 0.19.15 Index type: UInt32 Platform: Linux-6.3.5-arch1-1-x86_64-with-glibc2.38 Python: 3.9.13 (main, Apr 13 2023, 18:49:10) [GCC 12.2.0] ----Optional dependencies---- adbc_driver_sqlite: cloudpickle: 2.2.1 connectorx: 0.3.1 deltalake: 0.13.0 fsspec: 2023.6.0 gevent: matplotlib: numpy: 1.24.4 openpyxl: pandas: 2.0.3 pyarrow: 12.0.1 pydantic: 1.10.12 pyiceberg: pyxlsb: sqlalchemy: 1.4.41 xlsx2csv: xlsxwriter: ```
deanm0000 commented 11 months ago

Try putting a \ in front of the ?

Julian-J-S commented 11 months ago

Yeah this can happen. Try this

https://docs.python.org/3/library/urllib.parse.html?highlight=quote#urllib.parse.quote_plus

danielgafni commented 11 months ago

Thanks for the workarounds! Seems like polars should handle this internally tho?

Julian-J-S commented 11 months ago

I think manually escaping the pw is no future proof solution but using my linked method should work fine. Both ways are possible. One can say it's your job to create a valid connection string ;) But I agree, if polars does NOT do this it should be documented that the user has to do this and maybe give an easy example :)

deanm0000 commented 11 months ago

Not all databases have the same connection string syntax and the focus is on making the queries work better rather than making connections string builders.

Once they make simple connection string builders then every option is fair game for someone to ask for.

Better to just leave it to the user or maybe a different library, maybe sqlalchemy can make a connection without doing anything else. I don't know.

deanm0000 commented 11 months ago

@stinodego should this be closed or just turned into a feature request?

stinodego commented 10 months ago

I'd like to phone a friend on this one. @alexander-beedie what's your take on this?

alexander-beedie commented 10 months ago

I'd like to phone a friend on this one. @alexander-beedie what's your take on this?

I think @deanm0000 is right; SQLAlchemy supports only itself - we support multiple engines/backends, and I agree that we don't want to get into the business of taking connection strings, parsing them, working out which specific rules apply for the given engine/backend combination, escaping accordingly, and then recombining them and handing off. The caller should know what is valid, and provide it in the correct form.

We should also probably assume that there are plenty of people already escaping their password correctly. If we now start applying quoting rules we will break them.

Clearly documenting that you are expected to supply a valid connection string "ready for use" when passed into the given engine is a good move, and I think that's sufficient 👍

stinodego commented 10 months ago

Thanks Alex!

Right, so we'll accept a PR that improves the documentation on this, but we will not support it. I'll mark this as accepted.

isaacnfairplay commented 6 months ago

This issue does not impact when using polars 0.18.11 with connectorx 0.3.2a7