mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.89k stars 562 forks source link

fractional seconds dropped when loading data to MySQL (Aurora) #1039

Closed ghost closed 2 years ago

ghost commented 2 years ago

Environment

Issue

The issues is similar to https://github.com/mkleehammer/pyodbc/issues/1023 , with exception that fast_executemany = False (not used at all).

Reproduce code:

mySQL table:

create table timestamp_table ( tm timestamp(6) null );

python.py

import datetime
import pyodbc

print('pyodbc version: %s' % pyodbc.version)

destination = "DRIVER={MySQL ODBC 8.0 Unicode Driver};SERVER=bi-poc.cluster-cnm7awpcawfe.eu-west-1.rds.amazonaws.com;PORT=3306;DATABASE=xbo;UID=dadmin;PWD=GYau7458"
conn = pyodbc.connect(destination, autocommit=True)
cur = conn.cursor()

single = (datetime.datetime(2021, 1, 1, 11, 21, 11, 311114),)

many = [(datetime.datetime(2022, 2, 3, 17, 28, 15, 344014),),
        (datetime.datetime(2022, 2, 4, 12, 48, 47, 309952),)
        ]

print(f"Single = {single}")
print(f"Many = {many}")

cur.execute("INSERT INTO timestamp_table VALUES (?)", single)
cur.executemany("INSERT INTO timestamp_table VALUES (?)", many)

cur.execute("SELECT tm FROM timestamp_table")

inserted = cur.fetchmany(10)
print(f"Inserted = {inserted}")
# pg_cur.execute("truncate table public.data_table")

conn.close()

Expected values:

Single = (datetime.datetime(2021, 1, 1, 11, 21, 11, 311114),) Many = [(datetime.datetime(2022, 2, 3, 17, 28, 15, 344014),), (datetime.datetime(2022, 2, 4, 12, 48, 47, 309952),)]

Actual values:

Inserted = [(datetime.datetime(2021, 1, 1, 11, 21, 11, 300000), ), (datetime.datetime(2022, 2, 3, 17, 28, 15, 300000), ), (datetime.datetime(2022, 2, 4, 12, 48, 47, 300000), )]

image

v-chojas commented 2 years ago

Could you post an ODBC trace? It could be driver bug. https://github.com/mkleehammer/pyodbc/issues/1023

gordthompson commented 2 years ago

Repro code:

import datetime

import pyodbc

cnxn = pyodbc.connect(
    "Driver=MySQL ODBC 8.0 Unicode Driver;"
    "Server=192.168.0.199;"
    "UID=scott;PWD=tiger;"
    "Database=test;"
)
print(cnxn.getinfo(pyodbc.SQL_DRIVER_VER))
# 08.00.0028

crsr = cnxn.cursor()
crsr.execute("drop table if exists timestamp_table")
crsr.execute("create table timestamp_table ( tm timestamp(6) null )")

single = (datetime.datetime(2021, 1, 1, 11, 21, 11, 311114),)
crsr.execute("INSERT INTO timestamp_table VALUES (?)", single)

print(crsr.execute("SELECT * FROM timestamp_table").fetchall())
# [(datetime.datetime(2021, 1, 1, 11, 21, 11, 300000), )]

ODBC trace:

odbctrace.log

v-chojas commented 2 years ago

This doesn't look right:

[SQLDescribeCol.c][504]
        Exit:[SQL_SUCCESS]                
            Column Name = [tm]                
            Data Type = 0x7ffc9006380a -> 93                
            Column Size = 0x7ffc90063810 -> 19                
            Decimal Digits = 0x7ffc9006380c -> 0     <-------****
            Nullable = 0x7ffc9006380e -> 1

For comparison here is what Microsoft SQL Server's driver returns for the fractional seconds precision: https://docs.microsoft.com/en-us/sql/relational-databases/native-client-odbc-api/sqldescribecol

I suspect a driver bug.

gordthompson commented 2 years ago

Hi @v-chojas . The calls to SQLDescribeCol and SQLDescribeColW are for the results of the final SELECT. The value is definitely being saved in the table as 2021-01-01 11:21:11.300000. Could this be the issue:

[ODBC][7998][1646702481.783069][SQLBindParameter.c][217]
        Entry:
            Statement = 0xd27cb0
            Param Number = 1
            Param Type = 1
            C Type = 11 SQL_C_TIMESTAMP
            SQL Type = 11 SQL_TIMESTAMP
            Col Def = 21
            Scale = 1    <--------- ?
            Rgb Value = 0xd28918
            Value Max = 0
            StrLen Or Ind = 0xd288e8
v-chojas commented 2 years ago

That comes from what SQLGetTypeInfo reports for the column size, less 20:

[ODBC][7998][1646702481.515269][SQLGetTypeInfo.c][168]
        Entry:
            Statement = 0xd27cb0
            Data Type = SQL_TYPE_TIMESTAMP
[ODBC][7998][1646702481.515319][SQLGetTypeInfo.c][321]
        Exit:[SQL_SUCCESS]
[ODBC][7998][1646702481.515344][SQLFetch.c][162]
        Entry:
            Statement = 0xd27cb0
[ODBC][7998][1646702481.515367][SQLFetch.c][352]
        Exit:[SQL_SUCCESS]
[ODBC][7998][1646702481.515392][SQLGetData.c][237]
        Entry:
            Statement = 0xd27cb0
            Column Number = 3
            Target Type = 4 SQL_INTEGER
            Buffer Length = 4
            Target Value = 0x7ffc900637ac
            StrLen Or Ind = (nil)
[ODBC][7998][1646702481.515417][SQLGetData.c][545]
        Exit:[SQL_SUCCESS]                
            Buffer = [21]                <---------- SQL_TYPE_TIMESTAMP only has column size of 21
            Strlen Or Ind = NULLPTR

The column size is defined in the documentation as https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgettypeinfo-function

"The maximum column size that the server supports for this data type. [...] For datetime data types, this is the length in characters of the string representation (assuming the maximum allowed precision of the fractional seconds component)."

The conclusion is still the same: driver bug.

gordthompson commented 2 years ago

Ah, so the driver reports that date/time values only have room for 21 characters and

>>> "2022-03-01 00:11:22.345678"[:21]
'2022-03-01 00:11:22.3'

so the value in the database becomes 2022-03-01 00:11:22.300000. Makes sense, thanks!

gordthompson commented 2 years ago

Workaround: Pass the str() of the datetime value. That is, instead of

dt = datetime.datetime(2021, 1, 1, 11, 21, 11, 311114)
crsr.execute("INSERT INTO timestamp_table VALUES (?)", dt)

print(crsr.execute("SELECT * FROM timestamp_table").fetchall())
# [(datetime.datetime(2021, 1, 1, 11, 21, 11, 300000), )]

use this

dt = datetime.datetime(2021, 1, 1, 11, 21, 11, 311114)
crsr.execute("INSERT INTO timestamp_table VALUES (?)", str(dt))

print(crsr.execute("SELECT * FROM timestamp_table").fetchall())
# [(datetime.datetime(2021, 1, 1, 11, 21, 11, 311114), )]
ghost commented 2 years ago

Just historical purposes. As mentioned by @gordthompson workaround looks ok, but in case you are writing custom ETL you might wanna use mysql.connector (or native posgreSQL drive) as write cursor.

Here are some ideas how to read from pyodbc and write to mysql.connector:

class DriveManager:
    def __init__(souce_driver, destination_driver, source, destination, query, table, chunksize):
        pass
        # init all objects and variables

    def transfer(self):
        for rows, cursor_description in self.souce_driver_object.read(...):
            self.destination_driver_object.write(rows, cursor_description, ...)

class Driver:
    @abstractmethod
    def read(self, query, chunksize):
        raise NotImplementedError

    @abstractmethod
    def write(self, rows, cursor_description, table):
        raise NotImplementedError

class PyODBC(Driver):
    def __init__(self, conn):
        pass
        # init conn and cursor

    def read(self, query, chunksize):
        self.cursor.execute(query)

        cursor_description = self.cursor.description

        while True:
            rows_tuple = self.cursor.fetchmany(chunksize)

            if not rows_tuple:
                break

            # mySQL can't work with pyodbc.Row, thus you have to list or dict it
            rows_list = [list(rows) for rows in rows_tuple]

            yield rows_list, cursor_description

    def write(self):
        pass

class MySQLConnector(Driver):
    def __init__(self, conn):
        pass

    def write(self, rows, cursor_description, table):
        pass
        # write data as per mySQL Python wiki

    def read(self):
        pass
        # yield rows

if __name__ == '__main__':
    dv = DriveManager(...)
    dv.transfer()