rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
http://rdicosmo.github.io/parmap/
Other
94 stars 20 forks source link

chunksize #81

Closed JuliaLawall closed 3 years ago

JuliaLawall commented 5 years ago

I have the impression that providing a chunksize and a list of work that has length longer than the number of cores requested allocates a file descriptor that is never freed.

I believe that I am using the following version:

.opam/4.06.1/lib/parmap

rdicosmo commented 5 years ago

Hi Julia, can you provide a minimal example showing this issue?

-- Roberto

On Mon, Feb 25, 2019 at 12:53:49AM -0800, JuliaLawall wrote:

I have the impression that providing a chunksize and a list of work that has length longer than the number of cores requested allocates a file descriptor that is never freed.

I believe that I am using the following version:

.opam/4.06.1/lib/parmap

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/rdicosmo/parmap/issues/81
  2. https://github.com/notifications/unsubscribe-auth/AAp-v8X7FOPru7gmTcBttPKM93t7WpVmks5vQ6SdgaJpZM4bPX9Q
JuliaLawall commented 5 years ago

On Mon, 25 Feb 2019, Roberto Di Cosmo wrote:

Hi Julia, can you provide a minimal example showing this issue?

Is there some available example code that uses parfold?

julia

-- Roberto

On Mon, Feb 25, 2019 at 12:53:49AM -0800, JuliaLawall wrote:

I have the impression that providing a chunksize and a list of work that has length longer than the number of cores requested allocates a file descriptor that is never freed.

I believe that I am using the following version:

.opam/4.06.1/lib/parmap

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/rdicosmo/parmap/issues/81 2.https://github.com/notifications/unsubscribe-auth/AAp-v8X7FOPru7gmTcBttPKM9 3t7WpVmks5vQ6SdgaJpZM4bPX9Q

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.[AAesmj4zJdzCftJbUeCmtckfuxz_bJjrks5vQ7o4gaJpZM4bPX9Q.gif]

rdicosmo commented 5 years ago

Sure, look at the "tests" directory, there are quite a few simple test examples there.

On Mon, Feb 25, 2019 at 03:15:25AM -0800, JuliaLawall wrote:

On Mon, 25 Feb 2019, Roberto Di Cosmo wrote:

Hi Julia, can you provide a minimal example showing this issue? Is there some available example code that uses parfold? julia

-- Roberto

On Mon, Feb 25, 2019 at 12:53:49AM -0800, JuliaLawall wrote:

I have the impression that providing a chunksize and a list of work that has length longer than the number of cores requested allocates a file descriptor that is never freed.

I believe that I am using the following version:

.opam/4.06.1/lib/parmap

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/rdicosmo/parmap/issues/81

2.https://github.com/notifications/unsubscribe-auth/AAp-v8X7FOPru7gmTcB ttPKM9 3t7WpVmks5vQ6SdgaJpZM4bPX9Q

JuliaLawall commented 5 years ago

I took simplescalefold.ml and added a loop around the call to parfold. At the shell, I limit the number of file descriptors to 20. The output I get is:

Testing scalability with 2 iterations on 12 to 102 cores The fold operation in this example is too simple to scale: this is just a test for the code. Sequential execution takes 0.005926 seconds cores: 2 len: 20000 q: 1 q: 2 q: 3 q: 4 q: 5 q: 6 q: 7 q: 8 q: 9 q: 10 q: 11 Fatal error: exception Unix.Unix_error(Unix.EMFILE, "pipe", "")

In my own test programs, I wasn't able to reproduce this with parmap or pariter, so perhaps it is specific to parfold.

julia (**) ( Sample use of Parmap, a simple library to perform Map computations on ) ( a multi-core ) ( ) ( Author(s): Roberto Di Cosmo ) ( ) ( This program is free software: you can redistribute it and/or modify ) ( it under the terms of the GNU General Public License as ) ( published by the Free Software Foundation, either version 2 of the ) ( License, or (at your option) any later version. ) (**)

open Parmap

let initsegm n = let rec aux acc = function 0 -> acc | n -> aux (n::acc) (n-1) in aux [] n ;;

let scale_test iter nprocmin nprocmax = Printf.eprintf "Testing scalability with %d iterations on %d2 to %d2 cores\n" iter nprocmin nprocmax; Printf.eprintf "The fold operation in this example is too simple to scale: this is just a test for the code.\n"; let l = initsegm 20000 in let cl,tseq =
let d=Unix.gettimeofday() in let l' = List.fold_right (+) l 0 in l',(Unix.gettimeofday() -. d) in Printf.eprintf "Sequential execution takes %f seconds\n" tseq; for i = nprocmin to nprocmax do let tot=ref 0.0 in for j=1 to iter do let d=Unix.gettimeofday() in Printf.eprintf "cores: %d len: %d\n" (i2) (List.length l); for q =1 to 1000 do Printf.eprintf "q: %d\n" q; ignore(parfold ~chunksize:2 ~ncores:(i2) (+) (L l) 0 (+)) done; let cl'=parfold ~ncores:(i2) (+) (L l) 0 (+) in tot:=!tot+.(Unix.gettimeofday()-.d); if cl<>cl' then Printf.eprintf "Parfold failure: result mismatch\n" done; let speedup=tseq /. (!tot /. (float iter)) in Printf.eprintf "Speedup with %d cores (average on %d iterations): %f (tseq=%f, tpar=%f)\n" (i2) iter speedup tseq (!tot /. (float iter)) done ;;

scale_test 2 1 10;;

rdicosmo commented 5 years ago

Thanks Julia for finding this!

Any volunteer to git bisect this and see what introduced this bug?

We would all be very grateful :-)

-- Roberto

On Tue, Feb 26, 2019 at 02:33:19AM -0800, JuliaLawall wrote:

I took simplescalefold.ml and added a loop around the call to parfold. At the shell, I limit the number of file descriptors to 20. The output I get is: Testing scalability with 2 iterations on 12 to 102 cores The fold operation in this example is too simple to scale: this is just a test for the code. Sequential execution takes 0.005926 seconds cores: 2 len: 20000 q: 1 q: 2 q: 3 q: 4 q: 5 q: 6 q: 7 q: 8 q: 9 q: 10 q: 11 Fatal error: exception Unix.Unix_error(Unix.EMFILE, "pipe", "") In my own test programs, I wasn't able to reproduce this with parmap or pariter, so perhaps it is specific to parfold. julia (** ) ( Sample use of Parmap, a simple library to perform Map computations on ) ( a multi-core ) ( ) ( Author(s): Roberto Di Cosmo ) ( ) ( This program is free software: you can redistribute it and/or modify ) ( it under the terms of the GNU General Public License as ) ( published by the Free Software Foundation, either version 2 of the ) ( License, or (at your option) any later version. ) (** ***) open Parmap let initsegm n = let rec aux acc = function 0 -> acc | n -> aux (n::acc) (n-1) in aux [] n ;; let scale_test iter nprocmin nprocmax = Printf.eprintf "Testing scalability with %d iterations on %d2 to %d2 cores\n" iter nprocmin nprocmax; Printf.eprintf "The fold operation in this example is too simple to scale: this is just a test for the code.\n"; let l = initsegm 20000 in let cl,tseq = let d=Unix.gettimeofday() in let l' = List.fold_right (+) l 0 in l',(Unix.gettimeofday() -. d) in Printf.eprintf "Sequential execution takes %f seconds\n" tseq; for i = nprocmin to nprocmax do let tot=ref 0.0 in for j=1 to iter do let d=Unix.gettimeofday() in Printf.eprintf "cores: %d len: %d\n" (i2) (List.length l); for q =1 to 1000 do Printf.eprintf "q: %d\n" q; ignore(parfold ~chunksize:2 ~ncores:(i2) (+) (L l) 0 (+)) done; let cl'=parfold ~ncores:(i2) (+) (L l) 0 (+) in tot:=!tot+.(Unix.gettimeofday()-.d); if cl<>cl' then Printf.eprintf "Parfold failure: result mismatch\n" done; let speedup=tseq /. (!tot /. (float iter)) in Printf.eprintf "Speedup with %d cores (average on %d iterations): %f (tseq=%f, tpar=%f)\n" (i*2) iter speedup tseq (!tot /. (float iter)) done ;; scale_test 2 1 10;;

— You are receiving this because you commented. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/rdicosmo/parmap/issues/81#issuecomment-467386466
  2. https://github.com/notifications/unsubscribe-auth/AAp-v_Pncc_Xhg5RB75Ghz9wVcW95-WWks5vRQ1vgaJpZM4bPX9Q
rdicosmo commented 3 years ago

Well, it turns out it was much easier to look into the code itself. The read pipe end on the master task side was never closed, so Parmap has been leaking file descriptors for quite a while. Fixed adding the proper instruction in commit 7edef273ef85c68b0b26dbefe03a59f66a1fdcd8 Thanks again, @JuliaLawall for spotting this.