pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
234 stars 124 forks source link

read_sql_query should support tuples as `params` dict values #996

Closed sh-at-cs closed 1 month ago

sh-at-cs commented 1 month ago

Describe the bug The type for pandas.read_sql_query is specified by pandas-stubs as a dictionary mapping from str to str | bytes | date | timedelta | datetime64 | timedelta64 | int | float | complex. But for some database drivers like psycopg2 or mysql-connector, it's also possible to pass in tuples e.g. as the right-hand side of IN conditions (WHERE ... IN (1, 2, 3)).

To Reproduce A minimal example using Postgres (it looks similar for MySQL except the parameter is written as :ids in the query):

  1. Launch a Postgres Docker container: docker run -p 9876:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres
  2. Install dependencies and run this script to see that tuple substitutions for IN conditions work fine (I included PEP 723 script metadata so you can just do uv run or pipx run on it to avoid needing to install dependencies manually):
# To make this executable directly w/ pipx run or uv run:
# /// script
# dependencies = [
#   "psycopg2==2.9.9",
#   "pandas==2.2.2"
# ]
# ///

import pandas as pd
import psycopg2

conn = psycopg2.connect(
  host="localhost", dbname="postgres", user="postgres", port=9876
)
cur = conn.cursor()
cur.execute("CREATE TABLE test (id serial PRIMARY KEY)")
cur.execute("INSERT INTO test(id) VALUES(1)")
cur.execute("INSERT INTO test(id) VALUES(2)")
cur.execute("INSERT INTO test(id) VALUES(3)")

df = pd.read_sql_query(
  "SELECT * FROM test WHERE id IN %(ids)s",
  conn,
  params={"ids": (1, 2)},
)
print(df)

Output is as expected:

   id
0   1
1   2
  1. Run Mypy on it and get this error message:
    bug.py:24: error: Dict entry 0 has incompatible type "str": "list[int]"; expected "str": "str | bytes | date | timedelta | datetime64 | timedelta64 | int | float | complex"  [dict-item]

Please complete the following information:

Additional context Something like this doesn't work for sqlite3 connections, because it really does only accept "primitive" values for substitutions.

But IMO it doesn't make sense to restrict the set of allowed types to the universally supported subset, because some other weird database could decide to not even support int substitutions or something like that and then even the current set wouldn't be valid.

Dr-Irv commented 1 month ago

Thanks for the report. A PR with a change to pandas-stubs/io/sql.pyi is welcome.

For tests, we use sqllite3. Not sure if your example will work with that. If so, then PR should include a test in test_io.py in function test_read_sql_query(). If not, then it's OK to do a PR without a test.