sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.
https://www.rubydoc.info/gems/mechanize/
MIT License
4.39k stars 473 forks source link

Mechanize doesn't raise Mechanize::ResponseCodeError if error happened before getting the response #617

Closed adrianodennanni closed 1 year ago

adrianodennanni commented 1 year ago

I was looking trough my logs of requests made with Mechanize and noticed that sometimes, a request might raise the default Mechanize::ResponseCodeError exception, but it would also raise Net::HTTPFatalError (500, 501, 502, 503) in the same line of code, at different times.

This last error (Net::HTTPFatalError) is first generated during the connection.request in the following block inside mechanize/http/agent.rb.rb:

# Send the request
begin
  response = connection.request(uri, request) { |res|
    response_log res

    response_body_io = response_read res, request, uri

    res
  }
rescue Mechanize::ChunkedTerminationError => e
  raise unless @ignore_bad_chunking

  response = e.response
  response_body_io = e.body_io
end

Since it happens before getting the response, raise Mechanize::ResponseCodeError.new(page, 'unhandled response') is never called, ending up with Mechanize raising the Net::HTTPFatalError exception.

I think it would be great if the only exception raised in this case was Mechanize::ResponseCodeError, since it should handle the errors that happened in the connection with the HTTP server, and not only handling the errors if the response exists.

flavorjones commented 1 year ago

@adrianodennanni Thanks for opening this issue, this is a good conversation to have.

The exception class Mechanize::ResponseCodeError is intended to wrap up a successful response from the server that has an HTTP status code other than 200, 301, or 302.

In the case you're describing, are you saying that you received a response from the server that was a 50x status code? Or was this a networking error? I'm a little confused by the string "(500, 501, 502, 503)" in the description.

adrianodennanni commented 1 year ago

I will paste here an example of this happening.

The exception message for the Net::HTTPFatalError was 503 "Service Unavailable".

A simple get, such as agent.get 'https://example.com' raised me Net::HTTPFatalError, with the following trace:

503 "Service Unavailable" (Net::HTTPFatalError)
/usr/local/lib/ruby/3.2.0/net/http/response.rb:258:in `error!'
/usr/local/lib/ruby/3.2.0/net/http/response.rb:267:in `value'
/usr/local/lib/ruby/3.2.0/net/http.rb:1293:in `connect'
/usr/local/lib/ruby/3.2.0/net/http.rb:1248:in `do_start'
/usr/local/lib/ruby/3.2.0/net/http.rb:1243:in `start'
/usr/local/bundle/gems/net-http-persistent-4.0.1/lib/net/http/persistent.rb:655:in `start'
/usr/local/bundle/gems/net-http-persistent-4.0.1/lib/net/http/persistent.rb:595:in `connection_for'
/usr/local/bundle/gems/net-http-persistent-4.0.1/lib/net/http/persistent.rb:885:in `request'
/usr/local/bundle/gems/mechanize-2.8.5/lib/mechanize/http/agent.rb:284:in `fetch'
/usr/local/bundle/gems/mechanize-2.8.5/lib/mechanize.rb:465:in `get'
/app/domain/my_example_script.rb:35:in `get_page'

In other occasions, the same agent.get line raised me a normal Mechanize::ResponseCodeError. I believe this case happened when response was completed, while the the previously mentioned exception happened during the connection phase of the fetch.

flavorjones commented 1 year ago

@adrianodennanni Again, ResponseCodeError is only intended to be used when a response has been successfully read, and the status code is surprising.

An exception related to networking should not be rescued by Mechanize -- in this case, the exception raised by Net::HTTP is what is raised. I think this is behaving as intended.

I'm happy to continue the conversation if you like -- I'm going to convert it to a Discussion.