pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.3k stars 17.8k forks source link

ENH: full SQL support for datetime64 values #7103

Closed jorisvandenbossche closed 10 years ago

jorisvandenbossche commented 10 years ago

See the tests I added for datetime, NaT and NaN values in PRs #7082 and #7100.

Postrgresql (psycopg2)

SQLite

MySQL (pymysql / MySQLdb / mysql.connector)

The data are feeded to the database using df.itertuples() and then np.asscalar(), and this gives data of the Timestamp class. And for some reason most drivers can handle this (because it is a subclass of datetime.datetime?), but pymysql and mysql.connector not.

For now, all the tests are skipped for MySQL / pymysql.

Oracle / SQL Server ?


There are also some issues with NaT handling (see the tests I added for datetime, NaT and NaN values in PRs #7082 and #7100), but to track this there is also an issue specific on NaN handling (#2754)

maxgrenderjones commented 10 years ago

Note that this is similar (the same?) as the NaN/NaT conversions (#2754, #4199, PR #4200) issues referred to in #4163

jorisvandenbossche commented 10 years ago

I suppose we can solve this by creating a custom Timestamp class, subclassing from the sqlalchemy.types.DateTime (also mentioned here https://github.com/pydata/pandas/issues/2754#issuecomment-42944154 by @jreback), with something like:

class Timestamp(sqlalchemy.types.TypeDecorator):
    """convert to/from pandas.tslib.Timestamp type """

    impl = sqlalchemy.types.DateTime

    def process_bind_param(self, value, dialect):
        f = getattr(value,'to_datetime', None)
        if f is not None:
            return f()
        return value

And the use this in _sqlalchemy_type instead of DateTime (https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L787). I tested it with pymysql, and it works. Question remain:

@jreback There are both Timestamp.to_datetime() and Timestamp.to_pydatetime() methods (I used to_datetime above). Any reason to not use to_datetime (which is slightly faster than to_pydatetime)?

jreback commented 10 years ago

@jorisvandenbossche hmm, they are virtually the same (one is on the shadow class _Timestamp) but it doesn't matter you can call them similary). Not really sure why they both exist.

I think Timestamp.to_datetime (and Timedelta.to_timedelta) are pretty clear ways of doing this.

jorisvandenbossche commented 10 years ago

however, if we would pick one, I think the name to_pydatetime is more clear, and also in line with the DatetimeIndex method

jreback commented 10 years ago

ok, sure (and maybe thats why their are 2)!