jeremyevans / sequel

Sequel: The Database Toolkit for Ruby
http://sequel.jeremyevans.net
Other
4.99k stars 1.07k forks source link

batch inserts do not return meaningful return values #408

Closed olek closed 12 years ago

olek commented 12 years ago

As I started to use sequel's functionality to perform batch inserts (bravo for including it!), I stumbled on one annoyance - while either insert_multiple and multi_insert work fine, neither is returning meaningful return value. Former returns back array of hashes, and latter returns nil.

It would help a lot if list of ids for created records could be returned from them.

For insert_multiple fix is trivial, simply replace 'each' with 'map' in the DataSet.insert_multiple method.

multi_insert is not so easy, still struggling with it, but I know that this is possible, at least in Postgres, which will happily accept query like "insert into foo (bar, baz) values (1,2), (3,4) returning id".

I understand that my 'issue' is borderline with 'new feature request', but I am desperate to get it working. It is frequently the case that 2 or more sets of batch data are inserted, and then couple more batches of data are added that use ids from previous batches.

Many thanks for your time and attention!

jeremyevans commented 12 years ago

This is definitely a feature request (the line was crossed :) ), and without a patch it should go to the mailing list (only bugs/pull requests should be posted on the issue tracker).

For insert_multiple, you can just call map{|r| r[:column]} on the returned value to get the column value, so I don't think anything special needs to be done there. Note that insert_multiple is not a batch insert method, it does individual insert statements for each row.

For the batch methods (multi_insert and import), I'm not sure this is something Sequel should handle by default. In most cases where large numbers of records are being inserted, you don't care about the returned value. The user would have to provide the column to return in most cases, as datasets don't generally know their own primary keys. In many cases, if you do want values, you are going to want the values determined by the database, but not all databases support that. I think before making any decisions, this is something that should be discussed with the community. Please post this feature request on the sequel-talk Google Group.

olek commented 12 years ago

Hi Jeremy

I understand you stance on the issue/feature front, it is up to you to draw a line in the sand :)

Nevertheless, even for insert_multiple, returned hash contains only the very same values that were provided as input to the method, therefore it is not possible to obtain newly generated primary keys from there. That behavior is meaningless and in my opinion qualifies as an issue.

I was able to implement/fix the feature for both insert_multiple and multi_insert, code may be not quite perfect since I do not comprehend architecture of Sequel, but I think it is not horrible either. Please have a look at it and tell me what you think.

Also, let me know if you think this should be instead posted in Google Groups.

Code provided in a form of a monkey-patch, the way I use it now.

Thank you for your time.

module Sequel
  class Dataset
    def insert_multiple(array, &block)
      if block
        array.map {|i| insert(block[i])}
      else
        array.map {|i| insert(i)}
      end
    end

    def import(columns, values, opts={})
      return @db.transaction{insert(columns, values)} if values.is_a?(Dataset)

      return if values.empty?
      raise(Error, IMPORT_ERROR_MSG) if columns.empty?

      rows = []
      if slice_size = opts[:commit_every] || opts[:slice]
        offset = 0
        loop do
          @db.transaction(opts) do
            multi_insert_sql(columns, values[offset, slice_size]).map do |st|
              returning_fetch_rows(st + ' ' + RETURNING + ' ' + insert_pk.to_s(self)) do |r|
                rows << (r.size == 1 ? r.values.first : r)
              end
            end
          end
          offset += slice_size
          break if offset >= values.length
        end
      else
        statements = multi_insert_sql(columns, values)
        @db.transaction do
          statements.map do |st|
            returning_fetch_rows(st + ' ' + RETURNING + ' ' + insert_pk.to_s(self)) do |r|
              rows << (r.size == 1 ? r.values.first : r)
            end
          end
        end
      end

      rows
    end
  end
end
jeremyevans commented 12 years ago

I think your insert_multiple change makes sense, so I'm reopening this issue for that.

The import/multi_insert change you are proposing will only work on PostgreSQL (and possibly Firebird), since they are the only databases supporting RETURNING. So that change can't made to the generic Sequel code. It's possible some refactoring could be made so that it would be supported on PostgreSQL, though. But for that you'll need to email the Google Group, explain your use case, and see if other users think it would be a useful feature (and hopefully offer advice for implementations on other supported databases).

olek commented 12 years ago

Thank you Jeremy.

Of course, you are right, my code is very Postgres-centric, and will need to be refactored to make it fit with other databases as well. Silly me :)

I will post to Google Groups as you recommended.