Closed bjoseph closed 10 years ago
I am getting the same error when attempting an http POST, but it only occurs intermittently.
Same here.
I encountered this too after upgrading from 1.0.0, along with this one in a few other places:
too many connection resets (due to end of file reached - EOFError) after 3 requests on 70046648458580
Easily resolved by sticking with 1.0.0, which seems much more stable compared to 2.x at the moment.
I'll try to find out when the regression came up.
Net::Http::Persistent introduced in 4d074f4ddcd005511e72881cdc0797b0d5dfd98b
Some stuff from debug log:
I, [2011-08-01T11:01:20.015280 #7916] INFO -- : Net::HTTP::Post: /login.fcc
D, [2011-08-01T11:01:20.015352 #7916] DEBUG -- : request-header: accept => */*
D, [2011-08-01T11:01:20.015384 #7916] DEBUG -- : request-header: user-agent => WWW-Mechanize/1.0.0 (http://rubyforge.org/projects/mechanize/)
D, [2011-08-01T11:01:20.015415 #7916] DEBUG -- : request-header: connection => keep-alive
D, [2011-08-01T11:01:20.015449 #7916] DEBUG -- : request-header: keep-alive => 300
D, [2011-08-01T11:01:20.015479 #7916] DEBUG -- : request-header: accept-encoding => gzip,identity
D, [2011-08-01T11:01:20.015509 #7916] DEBUG -- : request-header: accept-language => en-us,en;q=0.5
D, [2011-08-01T11:01:20.015540 #7916] DEBUG -- : request-header: host => us.etrade.com
D, [2011-08-01T11:01:20.015570 #7916] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7
D, [2011-08-01T11:01:20.015600 #7916] DEBUG -- : request-header: cookie => WRC_ID=93.125.111.29-1312185679759; TB=8785
D, [2011-08-01T11:01:20.015630 #7916] DEBUG -- : request-header: referer => https://us.etrade.com/e/t/home
D, [2011-08-01T11:01:20.015665 #7916] DEBUG -- : request-header: content-type => application/x-www-form-urlencoded
D, [2011-08-01T11:01:20.015695 #7916] DEBUG -- : request-header: content-length => 54
E, [2011-08-01T11:01:20.306272 #7916] ERROR -- : Rescuing EOF error
E, [2011-08-01T11:01:21.374473 #7916] ERROR -- : Rescuing EOF error
E, [2011-08-01T11:01:22.435259 #7916] ERROR -- : Rescuing EOF error
I also checked headers in Firefox plugin: Tamper Data It shows that the size of 3 first consecutive requests(POST, GET, GET) is -1.
Any solution to this issue? I am getting similar error with no effect increasing keep_alive_time or open_time: too many connection resets (due to end of file reached - EOFError) after 3 requests on 3084924 Net::HTTP::Persistent::Error
Same here, seems to fluctuate between EOF and ECONNRESET
Confirmed with Mechanize 2.0.1. Please fix this issue.
+1 I'm seeing the same thing
Same here. It occurs regardless of https or http.
If you know re-posting is harmless (as in a login form) you can temporarily set:
agent.agent.http.retry_change_request = true
To force a re-post, and seems it works for me.
Another ugly workaround is to manually reset the connection before posting.
#...
agent.agent.http.tap { |http|
http.reset http.connection_for(login_page.uri + form.action)
}
form.submit
I'm also seeing this quite frequently on 2.0.1, going to look into downgrading to 1.0.0 as a short-term solution to the problem.
Here is what looks to be going on:
I see this error when I run a request, then after a brief pause, run another request, using SSL. It does not happen when I run the requests back to back. I got a full backtrace from line 460 of persistent.rb, and that showed that the IOError was being raised from buffering.rb:145 in read_nonblock at a call to sysread_nonblock, called from net/protocol.rb:135 in rbuf_fill.
Digging down, it looks like sysread_nonblock is implemented (in 1.9.2) in the core file ossl_ssl.c, and only returns an IOError if one of two errors occurred, SSL_ERROR_ZERO_RETURN or SSL_ERROR_SYSCALL. The documentation here: http://www.openssl.org/docs/ssl/SSL_get_error.html indicates that either the connection has been closed, or an EOF was read that violates the SSL protocol. I'm guessing it is the first case because of the behavior with delays. So it looks like the problem is that regardless of the Connection: Keep-Alive setting, the SSL connection can go down and you get an EOFError. In the non-SSL situation, there is a similar problem with IO.read_nonblock() also having the possibility of returning EOFError.
The problem is going to be figuring out when that is the case so you can retry safely for non-idempotent queries. knu's workaround will work if you know it is safe to do so. You can't just always retry on EOFError because other levels can throw EOFError for different reasons.
I'm getting a similar error using 2.0.1. It happens on a POST request in which the server takes about 2-3 min to respond. The error I get is a little different though: "too many connection resets (due to Resource temporarily unavailable - Timeout::Error)" Fixed it temporarily by reverting back to 1.0.0 and setting high agent.read_timeout and agent.open_timeout settings.
we are facing similar problem though. i think we will choose the downgrade path and test out 2.0.2 again when its released.
anybody also getting occasional SocketError's about getaddrinfo failing because it couldnt find the remote host in addition to the ECONNRESET errors because of this bug? or is this a different problem that i'm having?
Having same problem :(
any updates?
Hey any updates... same here...
I started using selenium web driver.
same here... this issue is 4 months old now. can we await any updates?
I don't know what happened, but it is working for me now...
things i have done: installed watir-webdriver installed mechanize 1.0.0 uninstalled mechanize 1.0.0 and installed 2.0.1 again
The issue is related to:
Without example scripts to illustrate and reproduce your specific problem it is difficult to find a "fix" for your specific program.
Here is one example of a server I've come across that triggers this behavior:
a = Mechanize.new
result = a.get('https://junecloud.com/sync/deliveries/') do |page|
sleep 10
page.form_with(:action => "./").submit
end
It works without the sleep, which makes me think it has something to do with the keep alive behavior of the server.
I get the same error with @jinschoi's script.
And here's a similar trace from httpclient. Relevant part is 'KeepAliveDisconnected'. HTTPClient tries to re-post under some condition since we might not be able to detect a socket disconnection by peer.
EDIT: I forgot to add this URL: https://gist.github.com/1280318
I've released net-http-persistent 2.2 which will reset connections that have been idle for 5 seconds. Can some of you try your scripts with master @1fd7c77 or newer?
From my investigation at that time, the server for 'https://junecloud.com/sync/deliveries/' seems to have 1 or 2 sec as KeepAliveTimeout.
I'm writing this because I just thought that the second access in 1~4 sec might raise an error as same as before.
I picked 5s as the default because Apache uses it. If there's a better default I can change it, but I need feedback first.
I can have net-http-persistent display the idle time for a socket that needs to be reset. That might help.
Right now users can adjust the timeout through Mechanize#idle_timeout=
It's good to be able to configure.
But I thought that net-http-persistent might want to reopen the transport connection and retransmit the aborted sequence of requests when a peer disconnected the connection, according to
8.1.4 Practical Considerations 9.1.2 Idempotent Methods
of RFC2616.
httpclient always do that without user interaction even if the request is not idempotent. I don't know how browsers are doing. Mechanize might want to behave more like browsers.
On Oct 25, 2011, at 9:47 AM, alcalaerick86 wrote:
I used the latest net-http-persistent , and still get same error, after many times trying some of the times login does work, some do not.
Can you post your a script that I can use to reproduce?
@nahi net-http-persistent implements 8.1.4 and 9.1.2 and allows overriding per paragraph 4 via #retry_change_requests which is just now exposed in mechanize.
Browsers usually display a dialog box like "Do you want to resubmit this form?" when a POST needs to be resubmitted.
Modifying @jinschio's script like so:
require 'mechanize'
a = Mechanize.new
a.agent.set_http
a.agent.http.debug_output = $stderr
result = a.get('https://junecloud.com/sync/deliveries/') do |page|
sleep 10
page.form_with(:action => "./").submit
end
From the debug output from Net::HTTP I see:
$ ruby19 -Ilib t.rb
opening connection to junecloud.com...
opened
<- "GET /sync/deliveries/ HTTP/1.1[…]"
-> "HTTP/1.1 200 OK\r\n"
[…]
read 5403 bytes
Conn keep-alive
[pause for 10 seconds]
opening connection to junecloud.com… [new connection created here due to idle timeout]
opened
<- "POST /sync/deliveries/ HTTP/1.1[…]"
<- "cmd=login&type=web&email=&password=&newpassword=&confirmpass=&name="
-> "HTTP/1.1 200 OK\r\n"
read 5435 bytes
Conn keep-alive
Reducing the sleep value below 5s (the default idle timeout) the script fails with "too many connection resets" until it is reduced below 1s (0.9s works):
require 'mechanize'
a = Mechanize.new
a.idle_timeout = 0.9
result = a.get('https://junecloud.com/sync/deliveries/') do |page|
sleep 1
page.form_with(:action => "./").submit
end
If you've commented on this issue can you report a value of idle_timeout that works for your application?
require 'mechanize'
require 'logger'
agent = Mechanize.new{|a| a.log = Logger.new(STDERR) }
agent.read_timeout = 60
def add_cookie(agent, uri, cookie)
uri = URI.parse(uri)
Mechanize::Cookie.parse(uri, cookie) do |cookie|
agent.cookie_jar.add(uri, cookie)
end
end
page = agent.get "http://www.sistemasaplicados.com.mx"
form = page.forms.first
form.correo_ingresar = "ing.alcala@ofixcomp.com"
form.password = "ofixcomp"
page = agent.submit form
It worked, had to do a gem clean, to wipe out the old 1.9 nthttppersister. I dont know if it has to do anything with this, but it does not forward my mechanize page to the one its supposed to.,
@alcalaerick86 that's good. It also looks like www.sistemasaplicados.com.mx has a keep-alive timeout of at least 10 seconds (but less than 15).
@drbrain Ah, I understood. Sorry for the noise.
Browsers usually display a dialog box like "Do you want to resubmit this form?" when a POST needs to be resubmitted.
Yes, but have you ever seen it by the reason that the server disconnected a connection by KeepAliveTimeout? It must look 'Dialog popup just after pushing [submit] button'...
The idle timeout appears to work, and is a fine workaround, but it is very dependent on the actual timeouts involved. A better solution might be to modify net/http/persistent.rb to reopen a connection when an EOFError is thrown due to OpenSSL. The snippet included by @nahi on Oct 12 seems to suggest that httpclient does exactly this.
@jinschio The HTTP spec doesn't allow mechanize to do that by default. Read RFC 2616 section 8.1.4 paragraph 4:
This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails.
So for a GET it is OK to retry once, but not for a POST. This is to prevent duplicate records from being changed or modified in an unintended way.
Without modifying net/http I don't think there's a way to detect the socket close without attempting to make a request, and the error doesn't occur until after the request body has been sent, so I can't tell if the request was received or not.
You may set agent.retry_change_requests = true
(per "semantic understanding of the application") if you know this won't cause problems for your application (like a login or search form).
@drbrain is right. So the resolution should be; Non idempotent request must be done with fresh (non keep-alive) connection. Though I still haven't checked browsers.
@nahi I'm wondering if the browsers have a better way of detecting a server-closed socket than net/http does. Would SO_KEEPALIVE help? Some other socket option I don't know about?
I tried running dtruss on Safari, but it didn't reveal any use of the BSD socket API. I haven't pulled the firefox code to dig through, either.
@drbrain You can detect TCP connection close by doing IO multiplexing (Rewriting net/http with IO.select) but I don't think it's a case. We cannot detect without actually sending a packet.
Okay. How about some way to always ignore keep alive and always use a new connection? Will setting the timeout to 0 accomplish that?
@jinschoi I will make @mechanize.keep_alive = false
work again tomorrow, but setting the idle timeout to 0 (or -1) will also accomplish that.
@jinschoi can you try @839c008:
require 'mechanize'
a = Mechanize.new
a.keep_alive = false
result = a.get('https://junecloud.com/sync/deliveries/') do |page|
sleep 10
page.form_with(:action => "./").submit
end
@bjoseph I can't get your etrade example to work at all, I fear that they have decided I am a hacker ☹
I think it is safe to close this issue now. If you have issues with mechanize trunk and neither the new idle_timeout setting nor disabling keep_alive fix the issue please comment!
How can I use trunk? https://github.com/tenderlove/mechanize/issues/101
gem install hoe
git clone git://github.com/tenderlove/mechanize.git
cd mechanize
rake package
gem install pkg/mechanize-2.1.gem
That returns an error: https://gist.github.com/1340608
After executing, there is a mechanize-2.1.gem
in /pkg
.
Is that usable?
Ugh, check_extra_deps depends on a rubygems feature that doesn't exist yet.
Yes, you can run gem install pkg/mechanize-2.1.gem
and it will work.
I've updated my comment above to have working instructions.
Also, I will release a prerelease version of mechanize 2.1 on Monday.
Alright, I think I'll wait till Monday then. :)
I am trying to do some simple screen scraping on etrade's website but am getting a similar issue that was reported by someone earlier. I read through that message thread but it doesn't look like it was ever resolved.
Here is the error I am getting back:
Here is my code:
Any ideas?
Thanks