Implement Upsert with single SQL statement

isolaol commented 10 years ago

Dear developer,

The current incarnation (2.3) of SqlFu has Upsert method in CrudHelpers.cs. The upsert (ie. UpdateOrInsert) is a great facility and used in real world applications. I like it!

The implementation seems to try to update the record first and then insert it if the update fails. This two-pass protocol is "chatty". The single statement upsert would be far more efficient due to reduced round-trips to the database.

There are two ways of implementing single statement upsert. The traditional method is a combination of both UPDATE and INSERT statements. The fast method takes advantage of MERGE statement and VALUES table constructor.

A demostration of upsert without autogenerated primary key.

-- create a sample table
CREATE TABLE dbo.tabX (
    key int NOT NULL PRIMARY KEY,
    value nvarchar(20) NULL
)

-- insert record {key=8, value="foo"}

-- traditional method, primary key generated by application
IF EXISTS (SELECT 1 FROM [tabX] WHERE [key] = 8) BEGIN
    UPDATE [tabX] SET [value] = 'foo' WHERE [key] = 8
END
ELSE BEGIN
    INSERT INTO [tabX]([key], [value]) VALUES(8, 'foo')
END

-- fast method, primary key generated by application
MERGE INTO [tabX] AS D
USING (VALUES (8, 'foo')) AS S([key], [value])
ON (S.[key] = D.[key])
WHEN MATCHED THEN UPDATE SET D.[value] = S.[value]
WHEN NOT MATCHED BY TARGET THEN INSERT ([key], [value]) VALUES (S.[key], S.[value])

A demostration of upsert when primary key value is generated by database. Please note the presence of OUTPUT clause. It allows the library the read back any value generated by the database.

-- create a sample sequence of numbers
CREATE SEQUENCE dbo.SampleSeq AS int START WITH 1 INCREMENT BY 1;

-- create a sample table, with sequence as default value in primary key column
CREATE TABLE dbo.tabY (
    key int NOT NULL PRIMARY KEY DEFAULT (NEXT VALUE FOR dbo.SampleSequence),
    value nvarchar(20) NULL
);

-- insert record {key=NULL, value="bar"}

-- traditional method, primary key generated by database for new records
-- please note that IF EXISTS can be omitted due to key=NULL
INSERT INTO [tabY]([value]) OUTPUT INSERTED.[key] VALUES('bar')

-- fast method, primary key generated by database for new records
MERGE INTO [tabY] AS D
USING (VALUES (NULL, 'bar')) AS S([key], [value])
ON (S.[key] = D.[key])
WHEN MATCHED THEN UPDATE SET D.[value] = S.[value]
WHEN NOT MATCHED BY TARGET THEN INSERT ([value]) VALUES (S.[value])
OUTPUT INSERTED.[key]

I prefer MERGE statement over combining IF EXISTS, UPDATE and INSERT. MERGE statement is easier to query optimizer and does its job in single pass. The added benefit of MERGE statement is atomicity. I do not need to wrap the call of upsert method inside transaction!

For more information about the implementation of the MERGE statement in Microsoft SQL Server, please visit http://msdn.microsoft.com/en-us/library/bb510625(v=sql.100).aspx.

MERGE statatement has been supported since SQL Server 2008.

I hope this helps you to develop the best micro-orm available to the community of the .NET developers!

Keep going!

Kind regards, Olli-Pekka Isola

sapiens commented 10 years ago

Thanks for your input, I'll take it into account

jeremycook commented 8 years ago

MySQL 5 also supports an "upsert" syntax as documented here:

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

This blog post provides a great example of using it:

http://mechanics.flite.com/blog/2013/09/30/how-to-do-an-upsert-in-mysql/

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | HOPKINS   | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.01 sec)

mysql> insert into sakila.actor (actor_id, first_name, last_name, last_update) 
    -> values (50,'NATALIE','SMITH','2013-09-27 12:34:56')
    -> on duplicate key update
    ->   first_name = values(first_name),
    ->   last_name = values(last_name),
    ->   last_update = values(last_update);
Query OK, 2 rows affected (0.00 sec)

mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|       50 | NATALIE    | SMITH     | 2013-09-27 12:34:56 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)

Some key takeaways from the docs:

With on duplicate key update, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

If a table contains an AUTO_INCREMENT column and INSERT ... UPDATE inserts a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. If the statement updates a row instead [a.k.a. the number of affected-rows is 2], LAST_INSERT_ID() is not meaningful.

sapiens commented 8 years ago

Cool, thank you!

dario-l commented 4 years ago

Be careful with MERGE on MS SQL without transaction. https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

sapiens / SqlFu

Implement Upsert with single SQL statement #34