Closed chrisle closed 12 years ago
It looks like ActiveRecord will automatically fill in created_at and updated_at columns if the table has them. Using active_record-inline_schema, you can define:
class DataMine < ActiveRecord::Base
col :foo
col :bar
col :created_at, :type => :datetime
col :updated_at, :type => :datetime
data_miner do
import 'my data' do
#...
end
end
end
Here's where we are now. updated_at
and created_at
are ignored (as in they are treated like all other columns) by data_miner
because we can't guarantee they would be accurate – records might be touched by a process
step even if they weren't touched by an import
step:
class MyModel < ActiveRecord::Base
self.primary_key = 'foo'
col :foo
col :bar
col :baz
col :updated_at, :type => :datetime
data_miner do
import 'lots of stuff' do
key :foo
store :bar
# here's where you might expect that :updated_at would be updated every time we insert or upsert
end
process do
# but we don't know what a user does inside of a process block...
update_all :baz => 111
end
end
end
You could do this:
class MyModel < ActiveRecord::Base
# [...]
RIGHT_NOW = lambda { Time.now.utc }
data_miner do
import 'lots of stuff' do
# [...]
store :updated_at, :synthesize => RIGHT_NOW
But consider that this will force an update an every row, every time, even if no other column has changed.
Or we could do this:
import 'lots of stuff', :timestamp => true do
but implementing that correctly would be complicated - for every row, would we have to load the current data, compare it to the new data, and then only set updated_at if there's a difference?
So @chrisle what are you thinking?
Thanks for replying back to me, as well as open sourcing a really useful gem.
The particular expectation of updating created_at
and updated_at
is only one part of a longer gotcha: ActiveRecord isn't used when inserting records. As a result, I lost a lot of ActiveRecord's functionality.
(This isn't the code I used in my app the ideas are exactly the same.)
# app/model/user.rb
class User < ActiveRecord::Base
self.primary_key = :id
col :id, :type => :integer
col :name
col :created_at, :type => :datetime
col :updated_at, :type => :datetime
validates_uniqueness_of :name, :message => "That user name was already taken."
data_miner do
import 'stuff' do
key :id
store :user_name
end
end
end
# app/model/user_observer.rb
class UserObserver < ActiveRecord::Observer
def after_create(record)
email_admin("A new user #{record.name} was created!")
end
end
These tests will probably fail if you ran them.
# spec/model/user_spec.rb
require 'spec_helper'
describe User do
# Gotcha #1: validations don't validate as expected
it 'should not allow me to create the same user twice' do
User.create!(:name => 'Chris Le')
lambda { User.create!(:name => 'Chris Le') }.should raise_exception
end
# Gotcha #2: created_at / updated_at won't populate as expected
it 'should populate created_at and updated_at' do
new_user = User.create!(:name => 'Chris Le')
new_user.created_at.should be_a(DateTime)
new_user.updated_at.should be_a(DateTime)
end
# Gotcha #3: Observers won't fire as expected
it 'should have fired UserObserver#after_create' do
UserObserver.instance.should_receieve(:after_create)
User.create!(:name => 'Chris Le')
end
end
While that's not the code I'm using but all three happened to me. The reason is because data_miner uses upsert to handle the inserts into the database if the table has an auto_incrementing key or you're storing the primary_key. Import.rb line 88
So, my current solution is a monkey patch.
class DataMiner
# Monkey patch data_miner force the use of ActiveRecord instead of Upsert
class Step
class Import < Step
alias_method :__start__, :start
def start
# Removed code for upserting here.
table.each do |row|
record = model.send "find_or_initialize_by_#{@key}", attributes[@key].read(row)
attributes.each { |_, attr| attr.set_from_row record, row }
record.save! # The bang will force an exception to be raised if errors occur
end
refresh
nil
end
end
end
end
# Monkey patch Upsert gem to throw an exception if anything tries to use Upsert
# See lib/monkey_patches/data_miner_patch.rb
class Upsert
alias_method :__row__, :row
def row(selector, document = {})
Rails.logger.warn "Upsert was called! Selector: #{selector.inspect}, document: #{document.inspect}"
raise "Upsert needs to be avoided!"
nil
end
end
This wasn't the most elegant solution. It does work. Data_miner is now forced to use ActiveRecord to handle writing to the database. The solution is a slower and adds a monkey patch but the benefits me because I can use both ActiveRecord and data_miner's DSL.
It would be nice to add a toggle for this, but I wasn't sure that my use case was common enough and that my monkey patch didn't break something in the data_miner gem. I'm still testing.
hi @chrisle I think setting :validate => true
will do the trick...
How do I re-enable created_at and updated_at? I wasn't able to find the right place in this Gem to write it myself. Can you point me in the right direction?
My use case:
I have a folder that receives CSV files from another application. I'm importing / upserting them to my Rails app using this Gem. I would created_at and updated_at to be populated so that I know when the record was first created and when it was updated by this Gem.