thbar / kiba

Data processing & ETL framework for Ruby
https://www.kiba-etl.org
Other
1.75k stars 87 forks source link

Question: All-Or-Nothing pipelines #52

Closed gee-forr closed 6 years ago

gee-forr commented 6 years ago

Hi there...

I'm really excited to start using this gem and have a few questions...

I have a set of CSV's and or XLSX docs that I'll be turning into AR objects and then writing to a DB. I was wondering if it's possible to wrap the call to destination MyActiveRecordDestination in a transaction block, as I only want to write to the destination if all records are successfully transformed and loaded.

In your opinion, would you do it this way via a transaction, or is there a better way to have an all-or-nothing pipeline?

I was thinking a second option would be to make sure each row is .valid? during the transform phase of the pipeline, and aborting if there's errors there, but in my case, that won't catch DB level integrity failures.

I'm happy to contribute to the wiki with the outcomes of this issue.

Thank you so much for the great work on Kiba - I wish I knew about this project 6 months ago, but I'm glad I do now.

thbar commented 6 years ago

Hi Gabriel! Glad you find Kiba useful!

Relying just on .valid? is not something I would recommend, for the reason you mention.

You can use either a transaction or a temporary table of some sort.

Using a transaction

With the kiba command line

I'd first try using a transaction, but then you'll have to verify if the total quantity of data that you are pushing to your database is not rejected (if it's too big).

You can not currently "wrap" a transaction around a destination DSL call currently (although I will be thinking about how to achieve that later).

If you are using the kiba command line, then you could use something like:

class MyActiveRecordDestination
  attr_reader :model

  def initialize(model:)
    @model = model
  end

  def write(row)
    @transaction_started ||= model.begin_db_transaction
    # write your row
  end

  def close
    model.commit_db_transaction if @transaction_started
  end
end

(see ActiveRecord documentation).

If the kiba process crashes, the transaction should be rollbacked (but you'll have to double-check that behaviour to be sure).

With a custom call

A better way to achieve this, though, would be to use the programmatic API.

This way you can write:

Model.transaction do
  job = Kiba.parse do
    # your job declaration
  end
  Kiba.run(job)
end

This is much more elegant, because you get automatic rollback in case of error.

Using some form of temporary table

For more advanced needs, you could also rely on a temporary table.

You could also have a non temporary "staging" table, or use a version column in the target table, to achieve something somewhat similar.

I would first try to go with the simpler transaction solution, though!

Hope this helps - I'll let you close the issue if you have enough information!

gee-forr commented 6 years ago

This helps a lot @thbar :) Thanks!!! I've updated the wiki with the outcomes of this issue. https://github.com/thbar/kiba/wiki/Can-Kiba-handle-an-%22all-or-nothing%22-ETL-operation%3F

Thanks again :)