We can set the con parameter in Modin's read_sql to Modin's distributed SQL connection object, ModinDatabaseConnection. However, we can't do the same in pandas' read_sql, which requires the connection to be a "SQLAlchemy connectable, str, or sqlite3 connection".
read_sql in pandas has tested support for SQLAlchemy and sqlite3 connection objects. If read_sql gets a connection that is neither of those, it falls back to trying to use the object like it's a sqlite3 object, and warns
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:758: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
I considered making Modin's connection class implement sqlalchemy.engine.Connectable, but that interface is deprecated. I can't find its source code in the sqlalchemy source code on github, but I used inspect to examine it and found:
Show `Connectable` interface
```python
@util.deprecated_20_cls(
":class:`.Connectable`",
alternative=(
"The :class:`_engine.Engine` will be the only Core "
"object that features a .connect() method, and the "
":class:`_engine.Connection` will be the only object that features "
"an .execute() method."
),
constructor=None,
)
class Connectable(object):
"""Interface for an object which supports execution of SQL constructs.
The two implementations of :class:`.Connectable` are
:class:`_engine.Connection` and :class:`_engine.Engine`.
Connectable must also implement the 'dialect' member which references a
:class:`.Dialect` instance.
"""
def connect(self, **kwargs):
"""Return a :class:`_engine.Connection` object.
Depending on context, this may be ``self`` if this object
is already an instance of :class:`_engine.Connection`, or a newly
procured :class:`_engine.Connection` if this object is an instance
of :class:`_engine.Engine`.
"""
engine = None
"""The :class:`_engine.Engine` instance referred to by this
:class:`.Connectable`.
May be ``self`` if this is already an :class:`_engine.Engine`.
"""
def execute(self, object_, *multiparams, **params):
"""Executes the given construct and returns a
:class:`_engine.CursorResult`.
"""
raise NotImplementedError()
def scalar(self, object_, *multiparams, **params):
"""Executes and returns the first column of the first row.
The underlying cursor is closed after execution.
"""
raise NotImplementedError()
def _run_visitor(self, visitorcallable, element, **kwargs):
raise NotImplementedError()
def _execute_clauseelement(self, elem, multiparams=None, params=None):
raise NotImplementedError()
```
Instead, sqlalchemy _engine.Connection will be the only object that features an execute() method, and _engine.Engine will be the only object that features connect() method.
We probably should provide a create_engine factory method that forwards all args and kwargs to the appropriate constructor. The resulting engine object should have a connect method that can also take args and kwargs. Our class should probably implement Python Database API Specification v2.0, i.e. DB-API 2.0. According to this StackOverflow answer, we just need to implement conn.cursor(), cursor.execute(), and eithercursor.fetchmany() or cursor.fetchall().
We can set the
con
parameter in Modin'sread_sql
to Modin's distributed SQL connection object,ModinDatabaseConnection
. However, we can't do the same in pandas'read_sql
, which requires the connection to be a "SQLAlchemy connectable, str, or sqlite3 connection".read_sql
in pandas has tested support for SQLAlchemy and sqlite3 connection objects. Ifread_sql
gets a connection that is neither of those, it falls back to trying to use the object like it's a sqlite3 object, and warnsI considered making Modin's connection class implement
sqlalchemy.engine.Connectable
, but that interface is deprecated. I can't find its source code in the sqlalchemy source code on github, but I usedinspect
to examine it and found:Show `Connectable` interface
```python @util.deprecated_20_cls( ":class:`.Connectable`", alternative=( "The :class:`_engine.Engine` will be the only Core " "object that features a .connect() method, and the " ":class:`_engine.Connection` will be the only object that features " "an .execute() method." ), constructor=None, ) class Connectable(object): """Interface for an object which supports execution of SQL constructs. The two implementations of :class:`.Connectable` are :class:`_engine.Connection` and :class:`_engine.Engine`. Connectable must also implement the 'dialect' member which references a :class:`.Dialect` instance. """ def connect(self, **kwargs): """Return a :class:`_engine.Connection` object. Depending on context, this may be ``self`` if this object is already an instance of :class:`_engine.Connection`, or a newly procured :class:`_engine.Connection` if this object is an instance of :class:`_engine.Engine`. """ engine = None """The :class:`_engine.Engine` instance referred to by this :class:`.Connectable`. May be ``self`` if this is already an :class:`_engine.Engine`. """ def execute(self, object_, *multiparams, **params): """Executes the given construct and returns a :class:`_engine.CursorResult`. """ raise NotImplementedError() def scalar(self, object_, *multiparams, **params): """Executes and returns the first column of the first row. The underlying cursor is closed after execution. """ raise NotImplementedError() def _run_visitor(self, visitorcallable, element, **kwargs): raise NotImplementedError() def _execute_clauseelement(self, elem, multiparams=None, params=None): raise NotImplementedError() ```Instead, sqlalchemy
_engine.Connection
will be the only object that features anexecute()
method, and_engine.Engine
will be the only object that featuresconnect()
method.We probably should provide a
create_engine
factory method that forwards all args and kwargs to the appropriate constructor. The resulting engine object should have aconnect
method that can also take args and kwargs. Our class should probably implement Python Database API Specification v2.0, i.e. DB-API 2.0. According to this StackOverflow answer, we just need to implementconn.cursor()
,cursor.execute()
, and eithercursor.fetchmany()
orcursor.fetchall()
.