ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
559 stars 72 forks source link

Eio.Net.getaddrinfo_stream fails to resolve host after 1019 successful calls on linux #776

Open serpent7776 opened 3 weeks ago

serpent7776 commented 3 weeks ago

I'm writing an application using cohttp-eio that repeatedly connects to a remote host every 3s. The initial 1019 requests succeed, but all the following requests fail to resolve the host. I can simplify the code to the following:

let connect ~sw ~env addr =
  ignore @@ Eio.Net.connect ~sw env#net addr

let print addr =
  Format.printf "@[%a@]@." Eio.Net.Sockaddr.pp addr

let connect_and_print ~sw ~env addr =
  print addr;
  (match addr with | `Tcp (a,_) -> Eio.Net.Ipaddr.fold ~v4:(fun _ -> connect ~sw ~env addr) ~v6:(fun _ -> ()) a | _ -> ())

let getaddrinfo ~sw ~env uri =
  List.iter (connect_and_print ~sw ~env) @@
  Eio.Net.getaddrinfo_stream ~service:"http" env#net
    (Uri.host_with_default uri)

let google_uri = Uri.of_string "http://google.com"

let rec loop ~sw ~env clock count =
  Format.printf "%d:@." count;
  getaddrinfo ~sw ~env google_uri ;
  loop ~sw ~env clock @@ (count + 1)

let () =
  Eio_main.run @@ fun env ->
  Eio.Switch.run @@ fun sw ->
  let clock = env#clock in
  loop ~sw ~env clock 0

This initially given the correct output

0:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80
1:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80

But after 1019 requests Eio.Net.getaddrinfo_stream returns no addresses:

1018:
tcp:142.250.186.206:80
tcp:[2a00:1450:401b:80d::200e]:80
1019:
1020:
1021:
1022:
1023:
1024:
1025:

I get the same behaviour every single time. I tested this on three different machines: Manjaro, Ubuntu and FreeBSD. It only reproduces on linux systems, so it's likely an issue in eio_linux. Also, it reproduces only when Eio.Net.connect is called.

Possibly related https://github.com/ocaml-multicore/eio/issues/351

patricoferris commented 3 weeks ago

I think the problem is due to number of open connections. For example:

let connect ~sw ~env addr =
  let c = Eio.Net.connect ~sw env#net addr in
  Eio.Flow.close c

fixes this problem. Note with the use of the single switch these connections are never being closed (that may or may not be desired). Although I'm not quite sure why that is the case on Linux. But just thought I would send this comment to hopefully unblock you.

serpent7776 commented 3 weeks ago

Thanks, that's very helpful. I can confirm that closing the connections manually fixes the issue. I also did a roughly similar code in C and it also has the same behaviour when connections are not closed. After 1021 connections, getaddrinfo returns error: Servname not supported for ai_socktype, so this is linux behaviour.

But now I'm not sure, if this is eio issue, cohttp issue or am I supposed to close the connections manually?

talex5 commented 3 weeks ago

If you're using cohttp-eio, it will close the connection for you as soon as the switch goes out of scope. Just make sure you're not sharing a single switch for everything (like your example does) - see https://github.com/ocaml-multicore/eio?tab=readme-ov-file#switches-1.

The behaviour of getaddrinfo ignoring the error and returning an empty list is questionable, but comes from OCaml's Unix.getaddrinfo. It might be better to have our own version that passes errors on.

serpent7776 commented 3 weeks ago

Oh, I am using a single switch everywhere :)