Open sqlalchemy-bot opened 16 years ago
Michael Bayer (zzzeek) wrote:
Michael Bayer (zzzeek) wrote:
possible ORM flow:
obj = MyObject(id=1, data='data')
session.replace(obj) # session.server_merge()? session.upsert() ? session.add(obj, merge=True) ? session.add(obj, upsert=True)?
# 1. object must contain a full primary key. error if not.
# 2. persistence.py treats these as INSERTs, since we have to assume an INSERT will occur.
Michael Bayer (zzzeek) wrote:
Here's a recipe I made for the SQLA tutorial:
#!python
from sqlalchemy.sql.expression import Insert
class replace(Insert):
pass
# REPLACE is just like INSERT. Below is the easy road out:
import re
@compiles(replace, 'sqlite', 'mysql')
def compile_replace(replace, compiler, **kw):
stmt = compiler.sql_compiler.visit_insert(replace)
return re.sub(r'^INSERT', 'REPLACE', stmt)
# Or, take the hard road:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy import bindparam
@compiles(replace, 'sqlite', 'mysql')
def compile_replace(replace, compiler, **kw):
colspecs = {}
for col in replace.table.c: # table defaults
if col.default is not None:
colspecs[col](col) = col.default.arg
if replace.parameters: # statement parameters
for key, value in replace.parameters.iteritems():
col = replace.table.c[key](key)
colspecs[col](col) = bindparam(key, value)
for k in compiler.column_keys: # names sent to execute()
col = replace.table.c[k](k)
colspecs.setdefault(col, bindparam(col.key))
return "REPLACE INTO %s (%s) VALUES (%s)" % (
replace.table.name,
",".join(c.name for c in colspecs),
",".join(compiler.process(v) for v in colspecs.values())
)
Michael Bayer (zzzeek) wrote:
Pantelis Theodosiou (ypercube) wrote:
MySQL's REPLACE is not an appropriate equivalent for MERGE.
REPLACE does a DELETE+INSERT which has different semantics than MERGE.
It also breaks when foreign keys exist (if ON DELETE CASCADE is on it will have unwanted cascaded deletes in other tables and if ON DELETE RESTRICT is on, the REPLACE will fail.)
Michael Bayer (zzzeek) wrote:
the idea is that MERGE would be implemented directly but some kind of splitting-the-difference layer would need to be supported in order to provide a cross-database upsert.
Pantelis Theodosiou (ypercube) wrote:
I don't disagree with the approach, on the contrary.
Just mentioning that for MySQL there is no easy road - and REPLACE should probably not be used.
Michael Bayer (zzzeek) wrote:
That's why this is probably the oldest issue still open :). I want nothing to do with it, really.
Michael Bayer (zzzeek) wrote:
PG has added something for this, so we should attempt to support a limited UPSERT system.
PG has implemented upsert as an INSERT or UPDATE - e.g. for MySQL this is ON DUPLICATE KEY UPDATE. Unfortunately this leaves out SQLite has the only outlier that only has REPLACE.
we should also try to get some wisdom from the HN discussion at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65.
We will need to deliver the "default" expressions as well as the "onupdate" values simultaneously to such a statement.
I'm a little puzzled though as far as "upsert" vs. "merge". The upserts we see in PG, MySQL and SQLite all work first and foremost with an INSERT that has brand new data in the VALUES clause. But everything I can find with MERGE does not use a VALUES clause, it can only MERGE from another select statement. Unless those backends support ad-hoc lists of values, I don't quite see at the moment how "merge" is even a superset of "upsert", considering "upsert" supports ad-hoc values.
dunk010 () wrote:
Postgres 9.5 is now out, with ON CONFLICT support. It would be good to be able to easily use this via SqlAlchemy.
Michael Bayer (zzzeek) wrote:
sure, have an API? none of the various merge/upsert options act the same across databases. use cases are unknown. I personally never use upsert so I don't have good judgment on it.
Patrick (prdonahue) wrote:
This would be great. Working on a script now that could make use of this functionality.
Robin T (robin900) wrote:
I'm wondering what the best approach to implementing Postgres 9.5 ON CONFLICT would be for the expression layer (not the ORM). The specifics of ON CONFLICT make it seem to me that this is best kept specific to the postgresql dialect module, rather than attempting a dialect-independent addition to Insert. What would be the best way to attach an postgresql dialect ON CONFLICT clause to an dialect-independent Insert?
Quick idea 1:
#!python
from sqlalchemy.dialects.postgresql import on_conflict_do_update, on_conflict_do_nothing
myvals = {'col1': 1, 'col2': 'foo'}
myinsert = mytable.insert().values(**myvals)
myupsert = on_conflict_do_update(myinsert, target=mytable.c.primary_key_col).values(**myvals)
Quick idea 2:
#!python
from sqlalchemy.dialects.postgresql import OnConflictDoUpdate, OnConflictDoNothing
myvals = {'col1': 1, 'col2': 'foo'}
on_conflict_clause = OnConflictDoUpdate(target=mytable.c.primary_key_col).values(**myvals)
myupsert = mytable.insert(postgresql_on_conflict=on_conflict_clause).values(**myvals)
http://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT
Some notes:
Robin T (robin900) wrote:
I created a working version of PG 9.5's INSERT ... ON CONFLICT on my fork:
https://bitbucket.org/robin900/sqlalchemy/branch/ticket_3529#diff
Since it's dialect-specific, I think maybe ticket #3529 should no longer be a duplicate of this ticket; let #3529 be for the PG-dialect-specific addon for ON CONFLICT, and leave this ticket #960 for any dialect-agnostic MERGE/upsert feature.
Michael Bayer (zzzeek) wrote:
the theme of this issue is to implement the backend equivalents of MERGE including MySQL's "REPLACE" and PG's new feature. It is a completely safe assumption that the moment an API is introduced to support that of one backend, the entire planet will be breaking down the door to write code that works identically on the other. This is a very hard problem to solve and should be solved at least for MySQL / Postgresql before we go too far down any one road. The SQLAlchemy SQL API is extensible so recipes / examples / 3rd party packages on pypi that provide PG or MySQL's feature are all fine, but for inclusion in SQLA we need to at least make an attempt to address both backends. Adding a new issue just for PG's specific syntax IMO doesn't really help towards the goal of getting a good feature implemented in SQLA core (and even ORM).
Михаил Доронин (warrior2031) wrote:
@zzzeek I've implemented ON DUPLICATE KEY UPDATE
for mysql and tested it manually. I'm reading tests for on_conflict_do_update
it seems to use some internal things like inheriting from fixture, @classmethod
that defines tables etc. Do I have to use (and understand subsequently) all that or can I use my pytest
to write tests?
By the way - you can track progress and review the code in fork https://github.com/purpleP/sqlalchemy/tree/on_duplicate_key.
Michael Bayer (zzzeek) wrote:
@warrior2031 it would be best if you could try emulating some of the existing tests. For help on running them see the README.unittests.rst file.
Michael Bayer (zzzeek) wrote:
we support PG's ON CONFLICT REPLACE now in #3529, that's the one that had the most requests. I've never seen anyone request Oracle's MERGE ever. MySQL, the request here would be for "ON DUPLICATE KEY UPDATE", not "REPLACE" as mentioned previously. However, this would also be implemented as a MySQL-specific construct the way we did for Postgresql, that is, trying to make a generic MERGE out of this is not something I think we are doing. if folks still want ON DUPLICATE KEY UPDATE then please open another bug for that.
Михаил Доронин (warrior2031) wrote:
@zzzeek If I inherit from UpdateBase
I get the following error AttributeError: 'Merge' object has no attribute '_returning'
. I've tried to inherit from Insert
instead and it works, but feels like a hack.
#!python
class Merge(Insert):
def __init__(self, table, values, keys=None):
super(Merge, self).__init__(table, values)
self.keys = keys
self.table = table
self.values = values
@compiles(Merge, 'postgresql')
def postgres_merge(merge_stmt, compiler, **kwargs):
stmt = postgresql.insert(merge_stmt.table, merge_stmt.values)
column_names = next(iter(stmt.parameters)).keys()
stmt = stmt.on_conflict_do_update(
index_elements=merge_stmt.keys or stmt.table.primary_key,
set_={name: getattr(stmt.excluded, name) for name in column_names},
)
return compiler.visit_insert(stmt)
Can you explain how to do that properly? In the docs there is example with inheriting from UpdateBase
, but there's nothing about _returning
there.
Michael Bayer (zzzeek) wrote:
@warrior2031 I suggest trying to emulate the Postgresql ON CONFLICT patch for general guidance on how to make this work (and yes they subclass Insert). The patch and lots of discussion about it as well as the history of how it was developed can be seen here: https://gerrit.sqlalchemy.org/#/c/54/
Михаил Доронин (warrior2031) wrote:
@zzzeek
I've added the test, but after thinking for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly.
I don't know about you and other contributors, but in past if I had to test that my sqlalchemy queries work I always tested it by running against artificially created data. And this always seemed wrong to me, because before writing sqlalchemy query I'm always writing sql query first and testing it on real data. So what I was testing is just that sqlalchemy will produce the same sql. But because sqlalchemy can produce more than one isomorphic sql from it's query I couldn't test it properly (because for example VALUES
and columns in different order)
But this time I've decided to try approach that seems more correct to me (we're testing compiler in the end, right? So why not test just that?). https://github.com/purpleP/sqlalchemy/blob/on_duplicate_key/test/dialect/mysql/test_on_duplicate.py
This test relies a bit on the order of iteration of items of dictionary passes to update
parameter. So it kind of depends on implementation for now. But in case someone will broke that one can always parse the expected_sql
a bit and test that things like bar = VALUES(bar), baz = VALUES(baz)
is the same as baz = VALUES(baz), bar = VALUES(bar)
.
This would take less code than creating all this artificial data and running queries against it and testing the results.
What do you think about it?
Михаил Доронин (warrior2031) wrote:
@zzzeek Seems like I can't use mysql dialect insert inside of @compiles
to provide custom Merge
.
!#python
class Foo(Base):
__tablename__ = 'foos'
id = Column(Integer, primary_key=True)
a = Column(String(10))
b = Column(String(10))
class Merge(Insert):
def __init__(self, table, values):
super(Merge, self).__init__(table, values)
self.table = table
self.values = values
@compiles(Merge, 'mysql')
def mysql_merge(merge_stmt, compiler, **kwargs):
stmt = mysql.insert(merge_stmt.table, merge_stmt.values)
update = {name: getattr(stmt.vals, name) for name in stmt.parameters[0]}
stmt = stmt.on_duplicate_key_update(update=update)
return compiler.process(stmt, **kwargs)
And this falls in visit_on_duplicate_key_update
where I try to get columns from table to which values would be inserted cols = self.statement.table.c
but for some reason self.statement
is Merge
instead of Insert
. Which is probably because at some point earlier compiler started compiling Merge
and haven't done that yet, because it doesn't know that Merge
is just a dummy statement.
Michael Bayer (zzzeek) wrote:
for a while I've came to conclusion that it would be easier (and it actually what really should be tested) to test that the statement is compiled correctly.
I referred to https://gerrit.sqlalchemy.org/#/c/54 which has all the testing styles you need if you look at the code. Testing the compiled SQL is the primary kind of testing we do, in my link above, see that here: https://gerrit.sqlalchemy.org/#/c/54/30/test/dialect/postgresql/test_compiler.py
The tests that do round trips are also important. The round trip tests are important not because we're testing that the database works, but that all the statement execution mechanics around INSERT statements are exercised as not crashing or malfunctioning, including that all parameters given are correctly passed and processed, features like inserted_primary_keys don't crash, etc. For examples of round trip tests that had to be added after the fact to test bugs that occurred in the PG on conflict feature within the realm of parameter and execution handling (and not just "the right SQL"), see https://gerrit.sqlalchemy.org/#/c/370/ and https://gerrit.sqlalchemy.org/#/c/285/. you don't need to write all these tests as many of them won't apply but rudimental round trip tests are necessary.
Михаил Доронин (warrior2031) wrote:
@zzzeek I've tried to rewrite tests using your framework.
from sqlalchemy.testing.assertions import eq_, assert_raises
from sqlalchemy.testing import fixtures
from sqlalchemy import testing
from sqlalchemy.dialects.mysql import insert
class OnDuplicateTest(fixtures.TablesTest):
__only_on__ = 'mysql',
__backend__ = True
run_define_tables = 'each'
@classmethod
def define_tables(cls, metadata):
Table(
'foos', MetaData(),
Column('id', Integer, primary_key=True),
Column('bar', String(10)),
Column('baz', String(10)),
)
def test_bad_args(self):
assert_raises(
ValueError,
insert(self.tables.foos, values={}).on_duplicate_key_update
)
def test_on_duplicate_key_update(self):
foos = self.tables.foos
with testing.db.connect() as conn:
conn.execute(insert(foos, dict(id=1, bar='b')))
stmt = insert(foos, [dict(id=1, bar='ab'), dict(id=2, bar='b')])
stmt = stmt.on_duplicate_key_update(bar=stmt.vals.bar)
result = conn.execute(stmt)
eq_(result.inserted_primary_key [1])
eq_(
conn.execute(foos.select().where(foos.c.id == 1).fetchall()),
[(1, 'ab', None)]
)
But they skipped for some reason.
est/dialect/mysql/test_on_duplicate.py::OnDuplicateTest::test_bad_args SKIPPED
============================================================================================================================ short test summary info =============================================================================================================================
SKIP [1] /home/michael/code/my_projects/sqlalchemy/test/../lib/sqlalchemy/testing/config.py:96: 'OnDuplicateTest' unsupported for implementation '('mysql',)'
I can't spot a difference from postgresql OnConflictTest
and don't understand why are they skipped.
Michael Bayer (zzzeek) wrote:
Please ensure that you've famliarized with:
then please continue discussion re: writing tests and such on the SQLAlchemy mailing list at: https://groups.google.com/forum/#!forum/sqlalchemy
thanks!
Michael Bayer (zzzeek) wrote:
Current plan is to emulate the PG approach:
create a MySQL ON DUPLICATE KEY UPDATE variant of insert() - this is #4009
create a SQLite INSERT OR REPLACE variant of insert() - this is #4010
if someone really cares, make a MERGE() statement that only works on SQL Server, Oracle (but is part of main sql package since its SQL standard)
people that want platform agnostic merge/upsert, they can combine these constructs into their own @compiles
recipe:
#!python
class MyMerge(UpdateBase):
def __init__(self, whatever, whaevr...)
@compiles(MyMerge, "postgresql")
def _pg_merge(element, compiler, **kw):
construct = postgresql.insert(element.whtever).on_confict_do_update(whatever, ...)
return compiler.process(construct, **kw)
Changes by Michael Bayer (zzzeek): changed milestone from "0.9.0" to "0.x.xx"
Changes by Michael Bayer (zzzeek): changed milestone from "blue sky" to "0.9.0"
Changes by Michael Bayer (zzzeek): set milestone to "blue sky"
Changes by Michael Bayer (zzzeek): changed milestone from "0.x.xx" to "1.x.xx"
Changes by Michael Bayer (zzzeek): changed content from "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (valu" to "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES ("; changed title from "implement MERGE" to "implement MERGE (and/or PG ON CONFLICT and/or MySQL REPLACE etc. etc.)"
Changes by Michael Bayer (zzzeek): changed content from "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES (" to "Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
MERGE INTO table_name1 USING table_name2 ON (condition)
WHEN MATCHED THEN UPDATE SET column1 = value1 [column2 = value2 ...](,)
WHEN NOT MATCHED THEN INSERT columns VALUES ("; changed state from "new" to "wontfix"
Changes by Michael Bayer (zzzeek): changed state from "wontfix" to "open"
Issue created by jek (jek)
Implement generic MERGE, aka 'upsert'. In ANSI, it looks like:
Dialect support differs pretty widely. A quick & likely inaccurate poll:
IF EXISTS(SELECT ...)
INSERT ... ON DUPLICATE KEY UPDATE
orREPLACE INTO
, INSERT being preferableINSERT .. ON CONFLICT REPLACE