seamusabshere / data_miner

Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
MIT License
302 stars 21 forks source link

Is there a way to run a specific step? #21

Open towerhe opened 11 years ago

towerhe commented 11 years ago

I'm trying to use data_miner to achieve my routine importing jobs. In my case, I need to upload a xls file to my system to import the data from the file.

I have a lot of xls files with different (headers - cols) mappings. I defined import steps for each type of (headers - cols) mappings. So I need to run a specific import step after I upload a xls file. Is there a way to that?

seamusabshere commented 11 years ago

hi @towerhe you may be able to hack it with:

Car.data_miner_script.steps[9].start

it's a known problem with data_miner that this is hard to do - please let me know if you have suggestions!

towerhe commented 11 years ago

An import step need a static url which points to a resource. In my case, the url is dynamic. So for achieve my issues, I need to introduce new features to data_miner. But I have problems with running the specs.

I have degraded earth to 0.11.7, minitest to 3.5.0, and minitest-reporters to 0.9.0, but the specs still failed.

Would you please give me a favor on passing the specs?

seamusabshere commented 11 years ago

i hate to say it, but the tests have been neglected for years - they need to be cleaned up.

towerhe commented 11 years ago

yeah, I got it. i will have a try to improve it. but I have not any experiences on minitest.

BTW, IMO that the key is not need to an import step. If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

seamusabshere commented 11 years ago

If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

that should happen already - data_miner uses upsert internally - is that what you needed?

towerhe commented 11 years ago

But I have found the following codes:

def start
        if not validate? and (storing_primary_key? or table_has_autoincrementing_primary_key?)
          c = ActiveRecord::Base.connection_pool.checkout
          Upsert.stream(c, model.table_name) do |upsert|
            table.each do |row|
              selector = { @key => attributes[@key].read(row) }
              document = attributes.except(@key).inject({}) do |memo, (_, attr)|
                memo.merge! attr.updates(row)
                memo
              end
              upsert.row selector, document
            end
          end
          ActiveRecord::Base.connection_pool.checkin c
        else
          table.each do |row|
            record = model.send "find_or_initialize_by_#{@key}", attributes[@key].read(row)
            attributes.each { |_, attr| attr.set_from_row record, row }
            record.save!
          end
        end
        refresh
        nil
      end

Both the if block and the else one are need a @key, this means we have to define a key for our models.

seamusabshere commented 11 years ago

ok, i see what you mean - correct, data_miner assumes that it is always in upsert mode.

would your problem be solved if you could just leave out key and have it always insert?

towerhe commented 11 years ago

I'm now working hard to fix the tests. After I can pass all the tests, I will try to introduce a method to ignore the key.

seamusabshere commented 11 years ago

@towerhe do you need a gem release before you can close this?

towerhe commented 11 years ago

I haven't found a right way to implement this yet.