zzzeek / test_sqlalchemy

0 stars 0 forks source link

implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.) #960

Open sqlalchemy-bot opened 16 years ago

sqlalchemy-bot commented 16 years ago

Issue created by jek (jek)


Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (values) 

Dialect support differs pretty widely. A quick & likely inaccurate poll:

sqlalchemy-bot commented 11 years ago

Michael Bayer (zzzeek) wrote:


sqlalchemy-bot commented 11 years ago

Michael Bayer (zzzeek) wrote:


session.merge(obj, hard=True)

?

sqlalchemy-bot commented 11 years ago

Michael Bayer (zzzeek) wrote:


possible ORM flow:

obj = MyObject(id=1, data='data')
session.replace(obj)   # session.server_merge()?  session.upsert() ?  session.add(obj, merge=True) ? session.add(obj, upsert=True)?

# 1. object must contain a full primary key.   error if not.
# 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
sqlalchemy-bot commented 14 years ago

Michael Bayer (zzzeek) wrote:


Here's a recipe I made for the SQLA tutorial:

#!python
from sqlalchemy.sql.expression import Insert

class replace(Insert):
    pass

# REPLACE is just like INSERT.   Below is the easy road out:

import re
@compiles(replace, 'sqlite', 'mysql')
def compile_replace(replace, compiler, **kw):

    stmt = compiler.sql_compiler.visit_insert(replace)
    return re.sub(r'^INSERT', 'REPLACE', stmt)

# Or, take the hard road:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy import bindparam

@compiles(replace, 'sqlite', 'mysql')
def compile_replace(replace, compiler, **kw):
    colspecs = {}

    for col in replace.table.c:  # table defaults
        if col.default is not None:
            colspecs[col](col) = col.default.arg

    if replace.parameters:   # statement parameters
        for key, value in replace.parameters.iteritems():
            col = replace.table.c[key](key)
            colspecs[col](col) = bindparam(key, value)

    for k in compiler.column_keys:  # names sent to execute()
        col = replace.table.c[k](k)
        colspecs.setdefault(col, bindparam(col.key))

    return "REPLACE INTO %s (%s) VALUES (%s)" % (
        replace.table.name,
        ",".join(c.name for c in colspecs),
        ",".join(compiler.process(v) for v in colspecs.values())
    )
sqlalchemy-bot commented 15 years ago

Michael Bayer (zzzeek) wrote:


sqlalchemy-bot commented 9 years ago

Pantelis Theodosiou (ypercube) wrote:


MySQL's REPLACE is not an appropriate equivalent for MERGE.

REPLACE does a DELETE+INSERT which has different semantics than MERGE.

It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)

sqlalchemy-bot commented 9 years ago

Michael Bayer (zzzeek) wrote:


the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.

sqlalchemy-bot commented 9 years ago

Pantelis Theodosiou (ypercube) wrote:


I don't disagree with the approach, on the contrary.

Just mentioning that for MySQL there is no easy road - and REPLACE should probably not be used.

sqlalchemy-bot commented 9 years ago

Michael Bayer (zzzeek) wrote:


That's why this is probably the oldest issue still open :). I want nothing to do with it, really.

sqlalchemy-bot commented 9 years ago

Michael Bayer (zzzeek) wrote:


PG has added something for this, so we should attempt to support a limited UPSERT system.

PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.

We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.

I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.

sqlalchemy-bot commented 8 years ago

Michael Bayer (zzzeek) wrote:


Issue #3529 was marked as a duplicate of this issue.

sqlalchemy-bot commented 8 years ago

dunk010 () wrote:


Postgres 9.5 is now out, with ON CONFLICT support. It would be good to be able to easily use this via SqlAlchemy.

sqlalchemy-bot commented 8 years ago

Michael Bayer (zzzeek) wrote:


sure, have an API? none of the various merge/upsert options act the same across databases. use cases are unknown. I personally never use upsert so I don't have good judgment on it.

sqlalchemy-bot commented 8 years ago

Patrick (prdonahue) wrote:


This would be great. Working on a script now that could make use of this functionality.

sqlalchemy-bot commented 8 years ago

Robin T (robin900) wrote:


I'm wondering what the best approach to implementing Postgres 9.5 ON CONFLICT would be for the expression layer (not the ORM). The specifics of ON CONFLICT make it seem to me that this is best kept specific to the postgresql dialect module, rather than attempting a dialect-independent addition to Insert. What would be the best way to attach an postgresql dialect ON CONFLICT clause to an dialect-independent Insert?

Quick idea 1:

#!python
from sqlalchemy.dialects.postgresql import on_conflict_do_update, on_conflict_do_nothing

myvals = {'col1': 1, 'col2': 'foo'}
myinsert = mytable.insert().values(**myvals)
myupsert = on_conflict_do_update(myinsert, target=mytable.c.primary_key_col).values(**myvals)

Quick idea 2:

#!python
from sqlalchemy.dialects.postgresql import OnConflictDoUpdate, OnConflictDoNothing

myvals = {'col1': 1, 'col2': 'foo'}
on_conflict_clause = OnConflictDoUpdate(target=mytable.c.primary_key_col).values(**myvals)
myupsert = mytable.insert(postgresql_on_conflict=on_conflict_clause).values(**myvals)

http://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT

Some notes:

sqlalchemy-bot commented 8 years ago

Robin T (robin900) wrote:


I created a working version of PG 9.5's INSERT ... ON CONFLICT on my fork:

https://bitbucket.org/robin900/sqlalchemy/branch/ticket_3529#diff

Since it's dialect-specific, I think maybe ticket #3529 should no longer be a duplicate of this ticket; let #3529 be for the PG-dialect-specific addon for ON CONFLICT, and leave this ticket #960 for any dialect-agnostic MERGE/upsert feature.

sqlalchemy-bot commented 8 years ago

Michael Bayer (zzzeek) wrote:


the theme of this issue is to implement the backend equivalents of MERGE including MySQL's "REPLACE" and PG's new feature. It is a completely safe assumption that the moment an API is introduced to support that of one backend, the entire planet will be breaking down the door to write code that works identically on the other. This is a very hard problem to solve and should be solved at least for MySQL / Postgresql before we go too far down any one road. The SQLAlchemy SQL API is extensible so recipes / examples / 3rd party packages on pypi that provide PG or MySQL's feature are all fine, but for inclusion in SQLA we need to at least make an attempt to address both backends. Adding a new issue just for PG's specific syntax IMO doesn't really help towards the goal of getting a good feature implemented in SQLA core (and even ORM).

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


Issue #3985 was marked as a duplicate of this issue.

sqlalchemy-bot commented 7 years ago

Михаил Доронин (warrior2031) wrote:


@zzzeek I've implemented ON DUPLICATE KEY UPDATE for mysql and tested it manually. I'm reading tests for on_conflict_do_update it seems to use some internal things like inheriting from fixture, @classmethod that defines tables etc. Do I have to use (and understand subsequently) all that or can I use my pytest to write tests? By the way - you can track progress and review the code in fork https://github.com/purpleP/sqlalchemy/tree/on_duplicate_key.

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


@warrior2031 it would be best if you could try emulating some of the existing tests. For help on running them see the README.unittests.rst file.

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


we support PG's ON CONFLICT REPLACE now in #3529, that's the one that had the most requests. I've never seen anyone request Oracle's MERGE ever. MySQL, the request here would be for "ON DUPLICATE KEY UPDATE", not "REPLACE" as mentioned previously. However, this would also be implemented as a MySQL-specific construct the way we did for Postgresql, that is, trying to make a generic MERGE out of this is not something I think we are doing. if folks still want ON DUPLICATE KEY UPDATE then please open another bug for that.

sqlalchemy-bot commented 7 years ago

Михаил Доронин (warrior2031) wrote:


@zzzeek If I inherit from UpdateBase I get the following error AttributeError: 'Merge' object has no attribute '_returning'. I've tried to inherit from Insert instead and it works, but feels like a hack.

#!python
class Merge(Insert):
    def __init__(self, table, values, keys=None):
        super(Merge, self).__init__(table, values)
        self.keys = keys
        self.table = table
        self.values = values

@compiles(Merge, 'postgresql')
def postgres_merge(merge_stmt, compiler, **kwargs):
    stmt = postgresql.insert(merge_stmt.table, merge_stmt.values)
    column_names = next(iter(stmt.parameters)).keys()
    stmt = stmt.on_conflict_do_update(
        index_elements=merge_stmt.keys or stmt.table.primary_key,
        set_={name: getattr(stmt.excluded, name) for name in column_names},
    )
    return compiler.visit_insert(stmt)

Can you explain how to do that properly? In the docs there is example with inheriting from UpdateBase, but there's nothing about _returning there.

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


@warrior2031 I suggest trying to emulate the Postgresql ON CONFLICT patch for general guidance on how to make this work (and yes they subclass Insert). The patch and lots of discussion about it as well as the history of how it was developed can be seen here: https://gerrit.sqlalchemy.org/#/c/54/

sqlalchemy-bot commented 7 years ago

Михаил Доронин (warrior2031) wrote:


@zzzeek I've added the test, but after thinking for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly. I don't know about you and other contributors, but in past if I had to test that my sqlalchemy queries work I always tested it by running against artificially created data. And this always seemed wrong to me, because before writing sqlalchemy query I'm always writing sql query first and testing it on real data. So what I was testing is just that sqlalchemy will produce the same sql. But because sqlalchemy can produce more than one isomorphic sql from it's query I couldn't test it properly (because for example VALUES and columns in different order)

But this time I've decided to try approach that seems more correct to me (we're testing compiler in the end, right? So why not test just that?). https://github.com/purpleP/sqlalchemy/blob/on_duplicate_key/test/dialect/mysql/test_on_duplicate.py

This test relies a bit on the order of iteration of items of dictionary passes to update parameter. So it kind of depends on implementation for now. But in case someone will broke that one can always parse the expected_sql a bit and test that things like bar = VALUES(bar), baz = VALUES(baz) is the same as baz = VALUES(baz), bar = VALUES(bar). This would take less code than creating all this artificial data and running queries against it and testing the results.

What do you think about it?

sqlalchemy-bot commented 7 years ago

Михаил Доронин (warrior2031) wrote:


@zzzeek Seems like I can't use mysql dialect insert inside of @compiles to provide custom Merge.

!#python
class Foo(Base):
    __tablename__ = 'foos'
    id = Column(Integer, primary_key=True)
    a = Column(String(10))
    b = Column(String(10))

class Merge(Insert):
    def __init__(self, table, values):
        super(Merge, self).__init__(table, values)
        self.table = table
        self.values = values

@compiles(Merge, 'mysql')
def mysql_merge(merge_stmt, compiler, **kwargs):
    stmt = mysql.insert(merge_stmt.table, merge_stmt.values)
    update = {name: getattr(stmt.vals, name) for name in stmt.parameters[0]}
    stmt = stmt.on_duplicate_key_update(update=update)
    return compiler.process(stmt, **kwargs)

And this falls in visit_on_duplicate_key_update where I try to get columns from table to which values would be inserted cols = self.statement.table.c but for some reason self.statement is Merge instead of Insert. Which is probably because at some point earlier compiler started compiling Merge and haven't done that yet, because it doesn't know that Merge is just a dummy statement.

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly.

I referred to https://gerrit.sqlalchemy.org/#/c/54 which has all the testing styles you need if you look at the code. Testing the compiled SQL is the primary kind of testing we do, in my link above, see that here: https://gerrit.sqlalchemy.org/#/c/54/30/test/dialect/postgresql/test_compiler.py

The tests that do round trips are also important. The round trip tests are important not because we're testing that the database works, but that all the statement execution mechanics around INSERT statements are exercised as not crashing or malfunctioning, including that all parameters given are correctly passed and processed, features like inserted_primary_keys don't crash, etc. For examples of round trip tests that had to be added after the fact to test bugs that occurred in the PG on conflict feature within the realm of parameter and execution handling (and not just "the right SQL"), see https://gerrit.sqlalchemy.org/#/c/370/ and https://gerrit.sqlalchemy.org/#/c/285/. you don't need to write all these tests as many of them won't apply but rudimental round trip tests are necessary.

sqlalchemy-bot commented 7 years ago

Михаил Доронин (warrior2031) wrote:


@zzzeek I've tried to rewrite tests using your framework.

from sqlalchemy.testing.assertions import eq_, assert_raises
from sqlalchemy.testing import fixtures
from sqlalchemy import testing
from sqlalchemy.dialects.mysql import insert

class OnDuplicateTest(fixtures.TablesTest):
    __only_on__ = 'mysql',
    __backend__ = True
    run_define_tables = 'each'

    @classmethod
    def define_tables(cls, metadata):
        Table(
            'foos', MetaData(),
            Column('id', Integer, primary_key=True),
            Column('bar', String(10)),
            Column('baz', String(10)),
        )

    def test_bad_args(self):
        assert_raises(
            ValueError,
            insert(self.tables.foos, values={}).on_duplicate_key_update
        )

    def test_on_duplicate_key_update(self):
        foos = self.tables.foos
        with testing.db.connect() as conn:
            conn.execute(insert(foos, dict(id=1, bar='b')))
            stmt = insert(foos, [dict(id=1, bar='ab'), dict(id=2, bar='b')])
            stmt = stmt.on_duplicate_key_update(bar=stmt.vals.bar)
            result = conn.execute(stmt)
            eq_(result.inserted_primary_key [1])
            eq_(
                conn.execute(foos.select().where(foos.c.id == 1).fetchall()),
                [(1, 'ab', None)]
            )

But they skipped for some reason.

est/dialect/mysql/test_on_duplicate.py::OnDuplicateTest::test_bad_args SKIPPED
============================================================================================================================ short test summary info =============================================================================================================================
SKIP [1] /home/michael/code/my_projects/sqlalchemy/test/../lib/sqlalchemy/testing/config.py:96: 'OnDuplicateTest' unsupported for implementation '('mysql',)'

I can't spot a difference from postgresql OnConflictTest and don't understand why are they skipped.

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


Please ensure that you've famliarized with:

https://bitbucket.org/zzzeek/sqlalchemy/src/ccbd68bb3ce09a013c998e4dcc26c0df7b03f205/README.unittests.rst?at=master&fileviewer=file-view-default

then please continue discussion re: writing tests and such on the SQLAlchemy mailing list at: https://groups.google.com/forum/#!forum/sqlalchemy

thanks!

sqlalchemy-bot commented 7 years ago

Michael Bayer (zzzeek) wrote:


Current plan is to emulate the PG approach:

  1. create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is #4009

  2. create a SQLite INSERT OR REPLACE variant of insert() - this is #4010

  3. if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)

  4. people that want platform agnostic merge/upsert, they can combine these constructs into their own @compiles recipe:

#!python

class MyMerge(UpdateBase):
   def __init__(self, whatever, whaevr...)

@compiles(MyMerge, "postgresql")
def _pg_merge(element, compiler, **kw):
   construct = postgresql.insert(element.whtever).on_confict_do_update(whatever, ...)
   return compiler.process(construct, **kw)
sqlalchemy-bot commented 11 years ago

Changes by Michael Bayer (zzzeek): changed milestone from "0.9.0" to "0.x.xx"

sqlalchemy-bot commented 11 years ago

Changes by Michael Bayer (zzzeek): changed milestone from "blue sky" to "0.9.0"

sqlalchemy-bot commented 15 years ago

Changes by Michael Bayer (zzzeek): set milestone to "blue sky"

sqlalchemy-bot commented 9 years ago

Changes by Michael Bayer (zzzeek): changed milestone from "0.x.xx" to "1.x.xx"

sqlalchemy-bot commented 8 years ago

Changes by Michael Bayer (zzzeek): changed content from "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (valu" to "Implement generic MERGE, aka 'upsert'.  In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES ("; changed title from "implement MERGE" to "implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)"
sqlalchemy-bot commented 7 years ago

Changes by Michael Bayer (zzzeek): changed content from "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (" to "Implement generic MERGE, aka 'upsert'.  In ANSI, it looks like:

MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES ("; changed state from "new" to "wontfix"
sqlalchemy-bot commented 7 years ago

Changes by Michael Bayer (zzzeek): changed state from "wontfix" to "open"