Search timeout doesn't appear to work.

dblessing commented 8 years ago

We're using version 0.12.1, which should support the time: arg in the search method. However, in testing with a network simulator proxy I can't get this argument to actually timeout and abort the search.

Example:

Net::LDAP.new(<options>).search(base: 'cn=foo,dc=example,dc=com', filter: 'my_filter', time: 10)

I would expect the above query to time out after 10 seconds but it actually goes indefinitely. Is this a bug or am I misunderstand usage?

satoryu commented 8 years ago

This version of net-ldap seems to ignore time arg in the search method. so this is a bug. PR #243 which provides support for timeout has been merged. Now you should use the current master or the next version, 0.13.0 but no plan to release now.

jch commented 8 years ago

@satoryu thanks for triaging. I've opened https://github.com/ruby-ldap/ruby-net-ldap/pull/248 to prepare the next release.

satoryu commented 8 years ago

sorry, I misunderstood as I saw only Net::LDAP#search.

the value of time given on Net::LDAP#search is used by Net::LDAP::Connection#search and it seems that the arg should follow the specification defined in RFC 4511. But #243 would resolve @dblessing 's situation, I think.

@dblessing do you think you can confirm the latest version of net-ldap can resolve your situation? if not resolved, we need to keep this issue open, I think.

dblessing commented 8 years ago

I just upgraded to 0.13.0 and tested this again. Neither connect_timeout nor time work for me. I need both a connect and a read timeout.

Here's a code example of how I'm setting connect timeout:

options = { :host=>"127.0.0.1", :port=>33890, :encryption=>nil, :connect_timeout=>5 }

Net::LDAP.open(options) do |ldap|
  block.call(self.new(provider, ldap))
end

I'm using ToxiProxy to simulate a timeout while connecting to an LDAP server. I see that the attempt to open the connect hangs indefinitely and does not time out after the 5 seconds specified. This is the same behavior I found with time on Net::LDAP#search

Please let me know how I can help troubleshoot this further.

jch commented 8 years ago

@astratto could you take a look at @dblessing's transcript and see why timeout isn't being respected?

astratto commented 8 years ago

@jch I'll try to find some time to take a look at that in the next few days, sure.

gpherguson commented 8 years ago

Has there been any further information on this? We had an "event" last Friday due to a LDAP host being offline, and the v0.14.0 LDAP client hanging. It might have been waiting for a default TCP timeout to occur as the code doesn't set a timeout value.

astratto commented 8 years ago

Hey guys, sorry but I had to focus on other more urgent things and then it slipped my mind.

I've been using this code on different machines and also testing the timeout very often, so my guess is that you're hitting some corner cases we're not aware of.

@dblessing can you share with us the actual tests you're running?

astratto commented 8 years ago

I had a little bit of spare time today and here's what I've found so far. Note that these tests are using directly Ruby's Socket.tcp and don't involve net-ldap on purpose.

The test server is just an ugly and silly script:

require 'socket'

socketServer = TCPServer.open(8080)

while true
  Thread.new(socketServer.accept) do |conn|
    puts "Accepting connection from: #{conn.peeraddr[2]}"
    conn.puts "Close"
    conn.close
  end
end

Testing with Toxiproxy

Disclaimer: First time I use it, tell me if I'm doing something stupid.

Using the timeout feature of Toxiproxy, I can still see the socket creating the connection and then it hangs. This is because the timeout is on socket creation, but after the socket is there, there's no read timeout applied (Socket.tcp doesn't support it).

Toxiproxy test - client side

require 'toxiproxy'

Toxiproxy.reset
Toxiproxy.populate([
  {
    name: "myservice",
    listen: "127.0.0.1:22220",
    upstream: "127.0.0.1:8080"
  }
])

puts "#{Time.now} Testing no response and close connection after 5s with timeout at connect of 1s"

Toxiproxy[:myservice].downstream(:timeout, timeout: 5000).apply do
  puts Socket.tcp('127.0.0.1', 22220, connect_timeout: 1).read
end

puts "#{Time.now} Done."

Result client-side:

$ ruby toxiproxy.rb
2016-05-13 17:39:49 +0100 Testing no response and close connection after 5s with timeout at connect of 1s

2016-05-13 17:39:54 +0100 Done.

Result server-side:

Accepting connection from: 127.0.0.1

You can see that the socket is actually created and it reaches both Toxiproxy and the server. As expected the timestamp with Done is after 5 seconds instead of 1.

Toxiproxy logs:

INFO[0563] Accepted client     client=127.0.0.1:60321 name=myservice proxy=127.0.0.1:22220 upstream=127.0.0.1:8080
WARN[0568] Source terminated   bytes=0 err=read tcp 127.0.0.1:22220->127.0.0.1:60321: use of closed network connection name=myservice

Dropping packets

If, instead, I set a firewall rule to drop the packets and then actually simulate the inability of creating a socket, the code times out just fine.

I use OSX, so that's basically a:

$ # enable firewall
$ sudo pfctl -ef /etc/pf.conf
$ # drop packets on port 8080
$ (sudo pfctl -sr 2>/dev/null; echo "block drop quick proto tcp from any to any port = 8080") | sudo pfctl -f - 2>/dev/null

(Disable and flush your rules with)

sudo pfctl -df /etc/pf.conf

Result client-side

$ cat simple_socket.rb
require 'socket'

begin
  puts "#{Time.now} Starting..."
  Socket.tcp('127.0.0.1', 8080, connect_timeout: 1).read
rescue => e
  puts "#{Time.now} Error: #{e}"
end

$ ruby simple_socket.rb
2016-05-13 17:59:56 +0100 Starting...
2016-05-13 17:59:57 +0100 Error: Operation timed out - user specified timeout

dblessing commented 7 years ago

@astratto Thank you for testing. Sorry for not getting back sooner.

I wonder if it would be beneficial to look at something like https://github.com/lann/tcp-timeout-ruby. It would be a runtime dependency, but it's also small so the logic could potentially be pulled into this gem, too. It would give us a connect, read and write timeout.

If I get a chance, I'll do some testing to see if this is a drop-in replacement and if it solves the problem. If it is a drop-in replacement, maybe an alternative would be to allow users to pass the socket class all the way down? It's already an arg on the connection class. Just a thought...

geoffk02 commented 6 years ago

I think I'm seeing this problem even during the LDAP open; connections to ldap.va.pki.gov (to fetch CRLs) are sometimes hanging here:

net-ldap-0.16.0/lib/net/ber/ber_parser.rb:169:in ‘getbyte’ net-ldap-0.16.0/lib/net/ber/ber_parser.rb:169:in ‘read_ber’ net-ldap-0.16.0/lib/net/ldap/connection.rb:234:in ‘block in read’ net-ldap-0.16.0/lib/net/ldap/instrumentation.rb:19:in ‘instrument’ net-ldap-0.16.0/lib/net/ldap/connection.rb:233:in ‘read’ net-ldap-0.16.0/lib/net/ldap/connection.rb:201:in ‘queued_read’ net-ldap-0.16.0/lib/net/ldap/auth_adapter/simple.rb:23:in ‘bind’ net-ldap-0.16.0/lib/net/ldap/connection.rb:278:in ‘block in bind’ net-ldap-0.16.0/lib/net/ldap/instrumentation.rb:19:in ‘instrument’ net-ldap-0.16.0/lib/net/ldap/connection.rb:275:in ‘bind’ net-ldap-0.16.0/lib/net/ldap.rb:715:in ‘block in open’ net-ldap-0.16.0/lib/net/ldap/instrumentation.rb:19:in ‘instrument’ net-ldap-0.16.0/lib/net/ldap.rb:711:in ‘open’ net-ldap-0.16.0/lib/net/ldap.rb:644:in ‘open’

This particular server seems to have no timeout on its side either and I've seen connections hung at this point for days.

HynekPetrak commented 4 years ago

Hi, on certain servers I observe search and even the bind operation to hang for ever.

HarlemSquirrel commented 4 years ago

I've been trying to reproduce this but I don't think I'm using toxiproxy correctly because I'm not seeing the timeout.

# socket_server.rb

require 'socket'

socketServer = TCPServer.open(8080)

while true
  Thread.new(socketServer.accept) do |conn|
    puts "Accepting connection from: #{conn.peeraddr[2]}"
    conn.puts "Close"
    conn.close
  end
end

➤  ruby socker_server.rb
Accepting connection from: 127.0.0.1
Accepting connection from: 127.0.0.1

➤  docker run -it -p 8474:8474 -p 22220:22220 --network host shopify/toxiproxy
WARNING: Published ports are discarded when using host network mode
INFO[0000] API HTTP server starting                      host=0.0.0.0 port=8474 version=2.1.4
INFO[0004] Started proxy                                 name=myservice proxy=127.0.0.1:22220 upstream=127.0.0.1:8080
INFO[0004] Accepted client                               client=127.0.0.1:43286 name=myservice proxy=127.0.0.1:22220 upstream=127.0.0.1:8080
WARN[0004] Source terminated                             bytes=0 err=read tcp 127.0.0.1:22220->127.0.0.1:43286: use of closed network connection name=myservice
INFO[0004] Accepted client                               client=127.0.0.1:43290 name=myservice proxy=127.0.0.1:22220 upstream=127.0.0.1:8080
WARN[0004] Source terminated                             bytes=0 err=read tcp 127.0.0.1:22220->127.0.0.1:43290: use of closed network connection name=myservice

# toxiproxy.rb
require 'logger'

require 'net/ldap'

require 'toxiproxy'

logger = Logger.new(STDOUT)

Toxiproxy.reset
Toxiproxy.populate([
  {
    name: "myservice",
    listen: "127.0.0.1:22220",
    upstream: "127.0.0.1:8080"
  }
])

options = { :host=>"127.0.0.1", :port=>22220, :encryption=>nil, :connect_timeout=>5 }

Toxiproxy[:myservice].downstream(:timeout, timeout: 5000).apply do
 logger.info "Starting TCP connection..."
 logger.debug Socket.tcp('127.0.0.1', 22220, connect_timeout: 22).read
 logger.info "Done."

 logger.info "Starting LDAP connection..."
 ldap = Net::LDAP.new(options)
 conn = ldap.send :new_connection
 logger.debug conn.socket.read
 conn.close
 logger.info "Done."
end

➤  ruby toxiproxy.rb
I, [2020-10-29T01:03:39.816195 #10079]  INFO -- : Starting TCP connection...
D, [2020-10-29T01:03:39.816804 #10079] DEBUG -- :
I, [2020-10-29T01:03:39.816823 #10079]  INFO -- : Done.
I, [2020-10-29T01:03:39.816834 #10079]  INFO -- : Starting LDAP connection...
D, [2020-10-29T01:03:39.817193 #10079] DEBUG -- :
I, [2020-10-29T01:03:39.817243 #10079]  INFO -- : Done.

djdefi commented 2 years ago

Late to the party, but I have encountered this occurring with LDAPS (connecting via openssl) more often than unencrypted LDAP lately. Wondering if reproducing would be more reliable with TLS/SSL on the mock server.

ruby-ldap / ruby-net-ldap

Search timeout doesn't appear to work. #247

Testing with Toxiproxy

Dropping packets