openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Cannot delete scraper due to foreign key constraint failing #1084

Closed henare closed 7 years ago

henare commented 8 years ago
[Morph/production] ActiveRecord::StatementInvalid: Mysql2::Error: Cannot delete or update a parent row: a foreign key constraint fails (`morph`.`metrics`, CONSTRAINT `fk_rails_14230a8cc7` FOREIGN KEY (`run_id`) REFERENCES `runs` (`id`)): DELETE FROM `runs` WHERE `runs`.`id` = 168069

Backtrace

line 133 of [PROJECT_ROOT]/app/controllers/scrapers_controller.rb: destroy

View full backtrace and more info at honeybadger.io

tmtmtmtm commented 8 years ago

I've also noticed that I can't delete scrapers that have Webhooks defined. Once I delete those, I can then delete the scraper.

equivalentideas commented 7 years ago

I think the issue here might be that we're trying to delete the Run(s) associated with the scraper, but there are Metric(s) that reference those runs.

Here's a related StackOverflow http://stackoverflow.com/questions/34229264/cannot-delete-or-update-a-parent-row-a-foreign-key-constraint-fails-deleting

chrismytton commented 7 years ago

I've just been looking at this locally.

It seems that when a run has multiple metrics associated with it, trying to destroy the run causes an error. This is because Run is supposed to have_one Metric, but this isn't being enforced at the database level, so there are certain runs that have more than one metric associated with them.

I've opened https://github.com/openaustralia/morph/pull/1145 with a failing test to demonstrate this problem.

equivalentideas commented 7 years ago

@chrismytton nice sloothing :)

certain runs that have more than one metric associated with them

Any idea how a Run gets into this state?

chrismytton commented 7 years ago

Any idea how a Run gets into this state?

Good question! I've just had a quick grep around and it looks like these lines in Morph::Runner, which create a metric for a run, are the primary suspect. Working back up the stack leads me to this line in RunWorker, which finds a run and then passes it to Morph::Worker.

It seems that if a RunWorker job get retried by Sidekiq, it will use the existing run, but Morph::Runner will create a new Metric for that run on each retry.

So I think we probably want to do some kind of find_or_create type dance in Morph::Runner to check if a metric already exists for the run, and if it does then just update that rather than creating a new one?

chrismytton commented 7 years ago

I've opened a pull request that should fix this issue: https://github.com/openaustralia/morph/pull/1145