Closed gee-forr closed 6 years ago
Hi Gabriel! Glad you find Kiba useful!
Relying just on .valid?
is not something I would recommend, for the reason you mention.
You can use either a transaction or a temporary table of some sort.
kiba
command lineI'd first try using a transaction, but then you'll have to verify if the total quantity of data that you are pushing to your database is not rejected (if it's too big).
You can not currently "wrap" a transaction around a destination
DSL call currently (although I will be thinking about how to achieve that later).
If you are using the kiba
command line, then you could use something like:
class MyActiveRecordDestination
attr_reader :model
def initialize(model:)
@model = model
end
def write(row)
@transaction_started ||= model.begin_db_transaction
# write your row
end
def close
model.commit_db_transaction if @transaction_started
end
end
(see ActiveRecord documentation).
If the kiba
process crashes, the transaction should be rollbacked (but you'll have to double-check that behaviour to be sure).
A better way to achieve this, though, would be to use the programmatic API.
This way you can write:
Model.transaction do
job = Kiba.parse do
# your job declaration
end
Kiba.run(job)
end
This is much more elegant, because you get automatic rollback in case of error.
For more advanced needs, you could also rely on a temporary table.
You could also have a non temporary "staging" table, or use a version
column in the target table, to achieve something somewhat similar.
I would first try to go with the simpler transaction
solution, though!
Hope this helps - I'll let you close the issue if you have enough information!
This helps a lot @thbar :) Thanks!!! I've updated the wiki with the outcomes of this issue. https://github.com/thbar/kiba/wiki/Can-Kiba-handle-an-%22all-or-nothing%22-ETL-operation%3F
Thanks again :)
Hi there...
I'm really excited to start using this gem and have a few questions...
I have a set of CSV's and or XLSX docs that I'll be turning into AR objects and then writing to a DB. I was wondering if it's possible to wrap the call to
destination MyActiveRecordDestination
in a transaction block, as I only want to write to the destination if all records are successfully transformed and loaded.In your opinion, would you do it this way via a transaction, or is there a better way to have an all-or-nothing pipeline?
I was thinking a second option would be to make sure each row is
.valid?
during the transform phase of the pipeline, and aborting if there's errors there, but in my case, that won't catch DB level integrity failures.I'm happy to contribute to the wiki with the outcomes of this issue.
Thank you so much for the great work on Kiba - I wish I knew about this project 6 months ago, but I'm glad I do now.