pudo / dataset

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
https://dataset.readthedocs.org/
MIT License
4.78k stars 298 forks source link

Fix bug when UPSERTing a column named 'id' #284

Closed al42and closed 5 years ago

al42and commented 5 years ago

Hi!

I noticed an interesting bug with Table.upsert: the example from the documentation does not work.

The checks that should prevent double creation of index column are not quite working. If self._primary_id is set to None, then the index column with name id is created, yet later we try to create it again: https://github.com/pudo/dataset/blob/master/dataset/table.py#L246. This causes following exception (Python 3.6.7, SQLAlchemy 1.3.0b3):

Traceback (most recent call last):
  File "fail.py", line 7, in <module>
    table.upsert(data, ['id'])
  File "/.../dataset/dataset/table.py", line 176, in upsert
    row = self._sync_columns(row, ensure, types=types)
  File "/.../dataset/dataset/table.py", line 281, in _sync_columns
    self._sync_table(sync_columns)
  File "/.../dataset/dataset/table.py", line 247, in _sync_table
    self._table.append_column(column)
  File "/.../dataset/venv/lib/python3.6/site-packages/SQLAlchemy-1.3.0b3-py3.6-linux-x86_64.egg/sqlalchemy/sql/schema.py", line 752, in append_column
    column._set_parent_with_dispatch(self)
  File "/.../dataset/venv/lib/python3.6/site-packages/SQLAlchemy-1.3.0b3-py3.6-linux-x86_64.egg/sqlalchemy/sql/base.py", line 456, in _set_parent_with_dispatch
    self._set_parent(parent)
  File "/.../dataset/venv/lib/python3.6/site-packages/SQLAlchemy-1.3.0b3-py3.6-linux-x86_64.egg/sqlalchemy/sql/schema.py", line 1452, in _set_parent
    % (self.key, table.fullname)
sqlalchemy.exc.ArgumentError: Trying to redefine primary-key column 'id' as a non-primary-key column on table 'table'

I added the corresponding testcase.

I see several possible solutions. In this PR I set self._primary_id and self._primary_type to their default value in Table.__init__. This way, they will have valid and meaningful values from the very beginning and throughout the whole program.

Alternative approach is to change them when creating the table.

Or just make the check more robust:

if not column.name == self._primary_id and not (self._primary_id is None and column.name == self.PRIMARY_DEFAULT):
pudo commented 5 years ago

Thanks, makes sense!