Net::HTTP::Persistent::Error: too many connection resets (due to Connection reset by peer - Errno::ECONNRESET) after 2 requests on 14759220

bjoseph commented 13 years ago

I am trying to do some simple screen scraping on etrade's website but am getting a similar issue that was reported by someone earlier. I read through that message thread but it doesn't look like it was ever resolved.

Here is the error I am getting back:

Net::HTTP::Persistent::Error: too many connection resets (due to Connection reset by peer - Errno::ECONNRESET) after 2 requests on 14759220
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/net-http-persistent-1.8/lib/net/http/persistent.rb:446:in `rescue in request'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/net-http-persistent-1.8/lib/net/http/persistent.rb:422:in `request'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/mechanize-2.0.1/lib/mechanize/http/agent.rb:204:in `fetch'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/mechanize-2.0.1/lib/mechanize.rb:628:in `post_form'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/mechanize-2.0.1/lib/mechanize.rb:520:in `submit'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/mechanize-2.0.1/lib/mechanize/form.rb:167:in `submit'
    from (irb):74
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in `start'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in `start'
    from /Users/benny/.rvm/gems/ruby-1.9.2-p180@taxhaven/gems/railties-3.0.9/lib/rails/commands.rb:23:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

Here is my code:

require 'rubygems'
require 'mechanize'

agent = Mechanize.new
login_page = agent.get("https://www.etrade.com")
form = login_page.form_with(:action => '/login.fcc') 
form.USER     = "test"
form.PASSWORD = "test12"
form.submit

Any ideas?

Thanks

chip commented 13 years ago

I am getting the same error when attempting an http POST, but it only occurs intermittently.

shaiguitar commented 13 years ago

Same here.

lankz commented 13 years ago

I encountered this too after upgrading from 1.0.0, along with this one in a few other places:

too many connection resets (due to end of file reached - EOFError) after 3 requests on 70046648458580

Easily resolved by sticking with 1.0.0, which seems much more stable compared to 2.x at the moment.

styx commented 13 years ago

I'll try to find out when the regression came up.

styx commented 13 years ago

Net::Http::Persistent introduced in 4d074f4ddcd005511e72881cdc0797b0d5dfd98b

Some stuff from debug log:

I, [2011-08-01T11:01:20.015280 #7916]  INFO -- : Net::HTTP::Post: /login.fcc
D, [2011-08-01T11:01:20.015352 #7916] DEBUG -- : request-header: accept => */*
D, [2011-08-01T11:01:20.015384 #7916] DEBUG -- : request-header: user-agent => WWW-Mechanize/1.0.0 (http://rubyforge.org/projects/mechanize/)
D, [2011-08-01T11:01:20.015415 #7916] DEBUG -- : request-header: connection => keep-alive
D, [2011-08-01T11:01:20.015449 #7916] DEBUG -- : request-header: keep-alive => 300
D, [2011-08-01T11:01:20.015479 #7916] DEBUG -- : request-header: accept-encoding => gzip,identity
D, [2011-08-01T11:01:20.015509 #7916] DEBUG -- : request-header: accept-language => en-us,en;q=0.5
D, [2011-08-01T11:01:20.015540 #7916] DEBUG -- : request-header: host => us.etrade.com
D, [2011-08-01T11:01:20.015570 #7916] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
D, [2011-08-01T11:01:20.015600 #7916] DEBUG -- : request-header: cookie => WRC_ID=93.125.111.29-1312185679759; TB=8785
D, [2011-08-01T11:01:20.015630 #7916] DEBUG -- : request-header: referer => https://us.etrade.com/e/t/home
D, [2011-08-01T11:01:20.015665 #7916] DEBUG -- : request-header: content-type => application/x-www-form-urlencoded
D, [2011-08-01T11:01:20.015695 #7916] DEBUG -- : request-header: content-length => 54
E, [2011-08-01T11:01:20.306272 #7916] ERROR -- : Rescuing EOF error
E, [2011-08-01T11:01:21.374473 #7916] ERROR -- : Rescuing EOF error
E, [2011-08-01T11:01:22.435259 #7916] ERROR -- : Rescuing EOF error

I also checked headers in Firefox plugin: Tamper Data It shows that the size of 3 first consecutive requests(POST, GET, GET) is -1.

bhochhi commented 13 years ago

Any solution to this issue? I am getting similar error with no effect increasing keep_alive_time or open_time: too many connection resets (due to end of file reached - EOFError) after 3 requests on 3084924 Net::HTTP::Persistent::Error

simonmd commented 13 years ago

Same here, seems to fluctuate between EOF and ECONNRESET

jg commented 13 years ago

Confirmed with Mechanize 2.0.1. Please fix this issue.

GBH commented 13 years ago

+1 I'm seeing the same thing

knu commented 13 years ago

Same here. It occurs regardless of https or http.

If you know re-posting is harmless (as in a login form) you can temporarily set:

  agent.agent.http.retry_change_request = true

To force a re-post, and seems it works for me.

knu commented 13 years ago

Another ugly workaround is to manually reset the connection before posting.

  #...
  agent.agent.http.tap { |http|
    http.reset http.connection_for(login_page.uri + form.action)
  }
  form.submit

matthewbjones commented 13 years ago

I'm also seeing this quite frequently on 2.0.1, going to look into downgrading to 1.0.0 as a short-term solution to the problem.

jinschoi commented 13 years ago

Here is what looks to be going on:

I see this error when I run a request, then after a brief pause, run another request, using SSL. It does not happen when I run the requests back to back. I got a full backtrace from line 460 of persistent.rb, and that showed that the IOError was being raised from buffering.rb:145 in read_nonblock at a call to sysread_nonblock, called from net/protocol.rb:135 in rbuf_fill.

Digging down, it looks like sysread_nonblock is implemented (in 1.9.2) in the core file ossl_ssl.c, and only returns an IOError if one of two errors occurred, SSL_ERROR_ZERO_RETURN or SSL_ERROR_SYSCALL. The documentation here: http://www.openssl.org/docs/ssl/SSL_get_error.html indicates that either the connection has been closed, or an EOF was read that violates the SSL protocol. I'm guessing it is the first case because of the behavior with delays. So it looks like the problem is that regardless of the Connection: Keep-Alive setting, the SSL connection can go down and you get an EOFError. In the non-SSL situation, there is a similar problem with IO.read_nonblock() also having the possibility of returning EOFError.

The problem is going to be figuring out when that is the case so you can retry safely for non-idempotent queries. knu's workaround will work if you know it is safe to do so. You can't just always retry on EOFError because other levels can throw EOFError for different reasons.

dgmdan commented 13 years ago

I'm getting a similar error using 2.0.1. It happens on a POST request in which the server takes about 2-3 min to respond. The error I get is a little different though: "too many connection resets (due to Resource temporarily unavailable - Timeout::Error)" Fixed it temporarily by reverting back to 1.0.0 and setting high agent.read_timeout and agent.open_timeout settings.

lxcid commented 13 years ago

we are facing similar problem though. i think we will choose the downgrade path and test out 2.0.2 again when its released.

mohamedhafez commented 13 years ago

anybody also getting occasional SocketError's about getaddrinfo failing because it couldnt find the remote host in addition to the ECONNRESET errors because of this bug? or is this a different problem that i'm having?

woto commented 13 years ago

Having same problem :(

madsheep commented 13 years ago

any updates?

ghost commented 13 years ago

Hey any updates... same here...

bhochhi commented 13 years ago

I started using selenium web driver.

cantonic commented 13 years ago

same here... this issue is 4 months old now. can we await any updates?

cantonic commented 13 years ago

I don't know what happened, but it is working for me now...

things i have done: installed watir-webdriver installed mechanize 1.0.0 uninstalled mechanize 1.0.0 and installed 2.0.1 again

drbrain commented 13 years ago

The issue is related to:

The server you are connecting to
The types of requests you are making (idempotent GET requests vs non-idemponent POST requests)

Without example scripts to illustrate and reproduce your specific problem it is difficult to find a "fix" for your specific program.

jinschoi commented 13 years ago

Here is one example of a server I've come across that triggers this behavior:

a = Mechanize.new
result = a.get('https://junecloud.com/sync/deliveries/') do |page|
  sleep 10
  page.form_with(:action => "./").submit
end

It works without the sleep, which makes me think it has something to do with the keep alive behavior of the server.

nahi commented 13 years ago

I get the same error with @jinschoi's script.

And here's a similar trace from httpclient. Relevant part is 'KeepAliveDisconnected'. HTTPClient tries to re-post under some condition since we might not be able to detect a socket disconnection by peer.

EDIT: I forgot to add this URL: https://gist.github.com/1280318

drbrain commented 13 years ago

I've released net-http-persistent 2.2 which will reset connections that have been idle for 5 seconds. Can some of you try your scripts with master @1fd7c77 or newer?

nahi commented 13 years ago

From my investigation at that time, the server for 'https://junecloud.com/sync/deliveries/' seems to have 1 or 2 sec as KeepAliveTimeout.

I'm writing this because I just thought that the second access in 1~4 sec might raise an error as same as before.

drbrain commented 13 years ago

I picked 5s as the default because Apache uses it. If there's a better default I can change it, but I need feedback first.

I can have net-http-persistent display the idle time for a socket that needs to be reset. That might help.

Right now users can adjust the timeout through Mechanize#idle_timeout=

nahi commented 13 years ago

It's good to be able to configure.

But I thought that net-http-persistent might want to reopen the transport connection and retransmit the aborted sequence of requests when a peer disconnected the connection, according to

8.1.4 Practical Considerations 9.1.2 Idempotent Methods

of RFC2616.

httpclient always do that without user interaction even if the request is not idempotent. I don't know how browsers are doing. Mechanize might want to behave more like browsers.

drbrain commented 13 years ago

On Oct 25, 2011, at 9:47 AM, alcalaerick86 wrote:

I used the latest net-http-persistent , and still get same error, after many times trying some of the times login does work, some do not.

Can you post your a script that I can use to reproduce?

drbrain commented 13 years ago

@nahi net-http-persistent implements 8.1.4 and 9.1.2 and allows overriding per paragraph 4 via #retry_change_requests which is just now exposed in mechanize.

Browsers usually display a dialog box like "Do you want to resubmit this form?" when a POST needs to be resubmitted.

drbrain commented 13 years ago

Modifying @jinschio's script like so:

require 'mechanize'

a = Mechanize.new
a.agent.set_http
a.agent.http.debug_output = $stderr

result = a.get('https://junecloud.com/sync/deliveries/') do |page|
  sleep 10
  page.form_with(:action => "./").submit
end

From the debug output from Net::HTTP I see:

$ ruby19 -Ilib t.rb 
opening connection to junecloud.com...
opened
<- "GET /sync/deliveries/ HTTP/1.1[…]"
-> "HTTP/1.1 200 OK\r\n"
[…]
read 5403 bytes
Conn keep-alive
[pause for 10 seconds]
opening connection to junecloud.com… [new connection created here due to idle timeout]
opened
<- "POST /sync/deliveries/ HTTP/1.1[…]"
<- "cmd=login&type=web&email=&password=&newpassword=&confirmpass=&name="
-> "HTTP/1.1 200 OK\r\n"
read 5435 bytes
Conn keep-alive

Reducing the sleep value below 5s (the default idle timeout) the script fails with "too many connection resets" until it is reduced below 1s (0.9s works):

require 'mechanize'

a = Mechanize.new
a.idle_timeout = 0.9

result = a.get('https://junecloud.com/sync/deliveries/') do |page|
  sleep 1
  page.form_with(:action => "./").submit
end

If you've commented on this issue can you report a value of idle_timeout that works for your application?

alcalaerick86 commented 13 years ago

require 'mechanize'
require 'logger'
agent = Mechanize.new{|a| a.log = Logger.new(STDERR) }
agent.read_timeout = 60
def add_cookie(agent, uri, cookie)
  uri = URI.parse(uri)
  Mechanize::Cookie.parse(uri, cookie) do |cookie|
    agent.cookie_jar.add(uri, cookie)
  end
end
page = agent.get "http://www.sistemasaplicados.com.mx"
form = page.forms.first
form.correo_ingresar = "ing.alcala@ofixcomp.com"
form.password = "ofixcomp"
page = agent.submit form

It worked, had to do a gem clean, to wipe out the old 1.9 nthttppersister. I dont know if it has to do anything with this, but it does not forward my mechanize page to the one its supposed to.,

drbrain commented 13 years ago

@alcalaerick86 that's good. It also looks like www.sistemasaplicados.com.mx has a keep-alive timeout of at least 10 seconds (but less than 15).

nahi commented 13 years ago

@drbrain Ah, I understood. Sorry for the noise.

Browsers usually display a dialog box like "Do you want to resubmit this form?" when a POST needs to be resubmitted.

Yes, but have you ever seen it by the reason that the server disconnected a connection by KeepAliveTimeout? It must look 'Dialog popup just after pushing [submit] button'...

jinschoi commented 13 years ago

The idle timeout appears to work, and is a fine workaround, but it is very dependent on the actual timeouts involved. A better solution might be to modify net/http/persistent.rb to reopen a connection when an EOFError is thrown due to OpenSSL. The snippet included by @nahi on Oct 12 seems to suggest that httpclient does exactly this.

drbrain commented 13 years ago

@jinschio The HTTP spec doesn't allow mechanize to do that by default. Read RFC 2616 section 8.1.4 paragraph 4:

This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails.

So for a GET it is OK to retry once, but not for a POST. This is to prevent duplicate records from being changed or modified in an unintended way.

Without modifying net/http I don't think there's a way to detect the socket close without attempting to make a request, and the error doesn't occur until after the request body has been sent, so I can't tell if the request was received or not.

You may set agent.retry_change_requests = true (per "semantic understanding of the application") if you know this won't cause problems for your application (like a login or search form).

nahi commented 13 years ago

@drbrain is right. So the resolution should be; Non idempotent request must be done with fresh (non keep-alive) connection. Though I still haven't checked browsers.

drbrain commented 13 years ago

@nahi I'm wondering if the browsers have a better way of detecting a server-closed socket than net/http does. Would SO_KEEPALIVE help? Some other socket option I don't know about?

I tried running dtruss on Safari, but it didn't reveal any use of the BSD socket API. I haven't pulled the firefox code to dig through, either.

nahi commented 13 years ago

@drbrain You can detect TCP connection close by doing IO multiplexing (Rewriting net/http with IO.select) but I don't think it's a case. We cannot detect without actually sending a packet.

jinschoi commented 13 years ago

Okay. How about some way to always ignore keep alive and always use a new connection? Will setting the timeout to 0 accomplish that?

drbrain commented 13 years ago

@jinschoi I will make @mechanize.keep_alive = false work again tomorrow, but setting the idle timeout to 0 (or -1) will also accomplish that.

drbrain commented 13 years ago

@jinschoi can you try @839c008:

require 'mechanize'
a = Mechanize.new
a.keep_alive = false

result = a.get('https://junecloud.com/sync/deliveries/') do |page|
  sleep 10
  page.form_with(:action => "./").submit
end

drbrain commented 13 years ago

@bjoseph I can't get your etrade example to work at all, I fear that they have decided I am a hacker ☹

drbrain commented 13 years ago

I think it is safe to close this issue now. If you have issues with mechanize trunk and neither the new idle_timeout setting nor disabling keep_alive fix the issue please comment!

manuelmeurer commented 13 years ago

How can I use trunk? https://github.com/tenderlove/mechanize/issues/101

drbrain commented 13 years ago

gem install hoe
git clone git://github.com/tenderlove/mechanize.git
cd mechanize
rake package
gem install pkg/mechanize-2.1.gem

manuelmeurer commented 13 years ago

That returns an error: https://gist.github.com/1340608

After executing, there is a mechanize-2.1.gem in /pkg. Is that usable?

drbrain commented 13 years ago

Ugh, check_extra_deps depends on a rubygems feature that doesn't exist yet.

Yes, you can run gem install pkg/mechanize-2.1.gem and it will work.

I've updated my comment above to have working instructions.

Also, I will release a prerelease version of mechanize 2.1 on Monday.

manuelmeurer commented 13 years ago

Alright, I think I'll wait till Monday then. :)

sparklemotion / mechanize

Net::HTTP::Persistent::Error: too many connection resets (due to Connection reset by peer - Errno::ECONNRESET) after 2 requests on 14759220 #123