Open tmaier opened 10 years ago
When you try to resolve a domain which does not exist, polipus creates an error page with SocketError.
SocketError
Actually, the page does not exist anymore. So it's like a 404 error. Just on DNS level.
But at the same time, SocketError will be raised if the internet connection got lost for any reason.
So to be sure, the site is gone, we would need a method like this
def internet_connection_available? Excon.head('http://www.google.com') logger.debug { 'Webpage not available anymore' } true rescue Excon::Errors::SocketError logger.error { 'Internet connection lost' } false end
Or maybe even better, something like this: http://stackoverflow.com/questions/2385186/check-if-internet-connection-exists-with-ruby/22837368#22837368
I use it like this:
crawler.on_page_error do |page| page.storable = false webpage_gone = page.error.is_a?(SocketError) && internet_connection_available? crawler.add_to_queue(page) unless page.not_found? || webpage_gone end
shall we add something for this case directly to polipus?
When you try to resolve a domain which does not exist, polipus creates an error page with
SocketError
.Actually, the page does not exist anymore. So it's like a 404 error. Just on DNS level.
But at the same time,
SocketError
will be raised if the internet connection got lost for any reason.So to be sure, the site is gone, we would need a method like this
Or maybe even better, something like this: http://stackoverflow.com/questions/2385186/check-if-internet-connection-exists-with-ruby/22837368#22837368
I use it like this:
shall we add something for this case directly to polipus?