pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.76k stars 297 forks source link

The update_many method does not support MSSQL2008 #374

Closed Remalloc closed 3 years ago

Remalloc commented 3 years ago

Environment

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context self.dialect.do_executemany( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany cursor.executemany(statement, parameters) File "src\pymssql_pymssql.pyx", line 486, in pymssql._pymssql.Cursor.executemany File "src\pymssql_pymssql.pyx", line 478, in pymssql._pymssql.Cursor.execute pymssql._pymssql.OperationalError: (8102, b"Cannot update identity column 'sno'.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:/Users/vbover/PycharmProjects/cms_otc_stat/src/autofill/autofill.py", line 188, in main() File "C:/Users/vbover/PycharmProjects/cms_otc_stat/src/autofill/autofill.py", line 184, in main aw.write_all() File "C:/Users/vbover/PycharmProjects/cms_otc_stat/src/autofill/autofill.py", line 174, in write_all self.set_success() File "C:/Users/vbover/PycharmProjects/cms_otc_stat/src/autofill/autofill.py", line 96, in set_success DB.update_many( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\src\utils\database.py", line 372, in update_many return True if table.update_many(rows, keys) else False File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\dataset\table.py", line 246, in update_many self.db.executable.execute(stmt, chunk) File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1200, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\sql\elements.py", line 313, in _execute_on_connection return connection._execute_clauseelement( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1389, in _execute_clauseelement ret = self._execute_context( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1748, in _execute_context self._handle_dbapi_exception( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1929, in _handle_dbapiexception util.raise( File "C:\Users\vbover\PycharmProjects\cms_otcstat\venv\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise raise exception File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context self.dialect.do_executemany( File "C:\Users\vbover\PycharmProjects\cms_otc_stat\venv\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany cursor.executemany(statement, parameters) File "src\pymssql_pymssql.pyx", line 486, in pymssql._pymssql.Cursor.executemany File "src\pymssql_pymssql.pyx", line 478, in pymssql._pymssql.Cursor.execute sqlalchemy.exc.OperationalError: (pymssql._pymssql.OperationalError) (8102, b"Cannot update identity column 'sno'.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n") [SQL: UPDATE swap_certificate_record SET sno=%(sno)s, flow_status=%(flow_status)s WHERE swap_certificate_record.sno = %(_sno)s] [parameters: ({'sno': 2, 'flow_status': 'OA', '_sno': 2}, {'sno': 3, 'flow_status': 'OA', '_sno': 3})] (Background on this error at: http://sqlalche.me/e/14/e3q8)

### My Solution
I think this issue due to MSSQL does not support updating primary key when we use UPDATE statement. I have removed pk before updating, and it works now.
```diff
# table.py 215
def update_many(self, rows, keys, chunk_size=1000, ensure=None, types=None):
    """Update many rows in the table at a time.

    This is significantly faster than updating them one by one. Per default
    the rows are processed in chunks of 1000 per commit, unless you specify
    a different ``chunk_size``.

    See :py:meth:`update() <dataset.Table.update>` for details on
    the other parameters.
    """
    keys = ensure_list(keys)

    chunk = []
    columns = []
    for index, row in enumerate(rows):
-       chunk.append(row)
        for col in row.keys():
-           if col not in columns:
+           if (col not in columns) and (col not in keys):            
                columns.append(col)

        # bindparam requires names to not conflict (cannot be "id" for id)
        for key in keys:
            row["_%s" % key] = row[key]
+           row.pop(key)
+       chunk.append(row)
Remalloc commented 3 years ago

@pudo

pudo commented 3 years ago

I don't have access to MSSQL, so I can't be much help in verifying this change.

More generally, updating a primary column isn't always a forbidden scenario: for example if I have a table with file metadata and designate path to be its PK, changing it is totally legit. But we can definitely remove all the keys from the update values in that function. Want to make a PR?

Remalloc commented 3 years ago

Sure, I will do some unit tests and create a PR. You mentioned that situation probably truly existence, although updating PK is not recommended way. So, should I take some measures and make sure it is compatible with last version? Like, add a param for deciding whether to update PK. @pudo

pudo commented 3 years ago

I think there's two independent issues here: a) a user trying to update the column the database system thinks is the PK, and b) dataset currently including the update criteria in the SET clause of the update. The latter is just a dumb bit of redundancy, and I think your PR gets rid of it completely. The former, however, updating the PK, is totally legal in all database systems except perhaps MSSQL, from what I can tell. It's definitely legal in scenarios where the PR isn't auto-incremented. So I would want to keep that ability :)

Remalloc commented 3 years ago

I got it. Actually, there are many old systems still use MSSQL, I hope this method can be used in MSSQL. Therefore, I think we can create another method for adapting with MSSQL, at the same time, we keep this potential ability as well. Like:

def _update_many_mssql(self, rows, keys, chunk_size=1000, ensure=None, types=None):
    """For MSSQL"""

def update_many(self, rows, keys, chunk_size=1000, ensure=None, types=None):
    if self.db.is_mssql:
        return _update_many_mssql(self, rows, keys, chunk_size=1000, ensure=None, types=None)
    ...

How do you think it, @pudo ?