lmmentel / mendeleev

A python package for accessing various properties of elements, ions and isotopes in the periodic table of elements.
https://mendeleev.readthedocs.io
MIT License
208 stars 38 forks source link

Fix fetch_table to be compatible across major pandas and sqlalchemy versions #159

Closed lmmentel closed 1 month ago

lmmentel commented 1 month ago

This fixes fetch_table to be compatible across combinations of major versions of pandas (1.x, 2.x) and sqlalchemy (1.5.x and 2.x).

tested combinations:

fixes #158

lmmentel commented 1 month ago

The case that still throws an error: pandas 2.2.2 and sqlalchemy 1.4.52

>>> from mendeleev.fetch import fetch_table, get_engine
>>> df = fetch_table("elements")
/home/lukasz/projects/mendeleev/mendeleev/fetch.py:71: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  return pd.read_sql_query(sql=text(query), con=conn, **kwargs)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lukasz/projects/mendeleev/mendeleev/fetch.py", line 71, in fetch_table
    return pd.read_sql_query(sql=text(query), con=conn, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.cache/pypoetry/virtualenvs/mendeleev-ZDKABmhr-py3.12/lib/python3.12/site-packages/pandas/io/sql.py", line 526, in read_sql_query
    return pandas_sql.read_query(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.cache/pypoetry/virtualenvs/mendeleev-ZDKABmhr-py3.12/lib/python3.12/site-packages/pandas/io/sql.py", line 2738, in read_query
    cursor = self.execute(sql, params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lukasz/.cache/pypoetry/virtualenvs/mendeleev-ZDKABmhr-py3.12/lib/python3.12/site-packages/pandas/io/sql.py", line 2670, in execute
    raise TypeError("Query must be a string unless using sqlalchemy.")
TypeError: Query must be a string unless using sqlalchemy.
lmmentel commented 1 month ago

There's a known issue https://github.com/pandas-dev/pandas/issues/57049 that latest pandas breaks compatibility with sqlalchemy 1.4.x which is still officially supported.

kalvdans commented 1 month ago

I think mendeleev should depend on pandas[sql-other] to prevent installation of incomatible pandas and sqlalchemy in the same environment.

lmmentel commented 1 month ago

I think mendeleev should depend on pandas[sql-other] to prevent installation of incomatible pandas and sqlalchemy in the same environment.

Could you elaborate on what you mean here?

Current dependencies from pyproject.toml are:

pandas = [
    { version = "^2.1", python = ">=3.12,<3.13" },
    { version = ">=1.0.0", python = "<3.12" },
]
SQLAlchemy = ">=1.4.0"

so both pandas and sqlalchemy are core dependencies for mendeleev.

kalvdans commented 1 month ago

I think mendeleev should depend on pandas[sql-other] to prevent installation of incomatible pandas and sqlalchemy in the same environment.

Could you elaborate on what you mean here?

Current dependencies from pyproject.toml are:

pandas = [
    { version = "^2.1", python = ">=3.12,<3.13" },
    { version = ">=1.0.0", python = "<3.12" },
]
SQLAlchemy = ">=1.4.0"

so both pandas and sqlalchemy are core dependencies for mendeleev.

What pip denote within square brackets are called extras and the syntax for poetry seems to be

pandas = [
    { version = "^2.1", python = ">=3.12,<3.13", extras = ["sql-other"] },
    { version = ">=1.0.0", python = "<3.12", extras = ["sql-other"] },
]
lmmentel commented 1 month ago

Didn't know about pandas extras, thanks for sharing. I'm not convinced about adding this since it will bump sqlalchemy to min 2.0.0 ref. Since sqlachemy is at the core of mendeleev and 1.4.x is still officially supported I wouldn't want to drop the support for these versions prematurely.

When it's dropped I think the same effect can be achieved explicitly by bumping the sqlalchemy version directly in pyproject.toml.

This is not really a mendeleev problem but rather pandas/sqlalchemy compatibility issue. I'm willing to go ahead since current fix solves compatibility as broadly as is possible.