Closed tmaier closed 10 years ago
I'm more prone to let the client to handle page download failures rather than apply a re-enqueue strategy blindly A built in naive failure retry strategy for temporary network failures is already in place (https://github.com/taganaka/polipus/blob/master/lib/polipus/http.rb#L170-L179)
A client can easily implement its own error handling workflow for more fine grain control:
polipus.on_page_downloaded do |page|
add_to_queue page if page.error
end
I can add a new block handler like on_page_error {|page|...}
so that the error handling logic can be encapsulated there
on_page_downloaded
blocks is always called, no matter what.
Introducing on_page_error
handler should then skip on_page_downloaded
call since the page has not been downloaded
I think Page#storable?
should return false if there was an error.
Drawback is that this would hinder us to investigate an error
Advantage would be that if we queue the url again, it would skip the url later
I like on_page_error
as this would encapsulate the error handling on one place.
And one of the examples should show
polipus.on_page_error do |page|
polipus.add_to_queue(page) if page.error
end
I was wrong. The drawback mentioned does not exist, as we do not store the error message in the database.
storing the page with the error associated into the DB could be useful for better offline debug.
Having on_page_error
blocks executed before[1] page.storable?
is evaluated, users could easily modify the flow:
polipus.on_page_error do |page|
page.storable = false
polipus.add_to_queue(page)
end
However to store the error message into the DB, error attribute needs to be serialized and exposed into page.to_hash
method
[1]https://github.com/taganaka/polipus/blob/on_page_error/lib/polipus.rb#L207
implemented here: https://github.com/taganaka/polipus/pull/29
Addressed into v 0.3.0
Today, I lost the internet connection. I got the following error message in my logs:
Polipus should not proceed processing a page in this case. It should requeue the page and proceed with the next page.