rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
http://rdicosmo.github.io/parmap/
Other
94 stars 20 forks source link

Fatal error: exception End_of_file #29

Closed sfmatt closed 10 years ago

sfmatt commented 10 years ago

Hi Roberto,

Thanks a lot for parmap which I was using successfully until recently. With v1.0-rc5 array_float_parmap returns the above exception for a source array of ~90K elements, even with ncores = 1. There does not seem to be any memory issue as around half of the computer's memory is free when the exception is raised. This is on Ubuntu 14.04 64bits btw.

Matt

sfmatt commented 10 years ago

My apologies for the false alarm. For some reason the opam-installed version of parmap was shadowed by an older version in which the problem was not yet corrected.

Also regarding issue #18: Fatal error: exception Failure("input_value_from_block: bad object"), it's caused by one of the children processes aborting abruptly (in my case an unexpected nan value in some complex computation causes the process to abort without warning/stack trace).

rdicosmo commented 10 years ago

Dear Matt, thanks for auto-fixing this :-)

Let me remark here that up to now parmap does not handle gracefully abnormal termination of one of the workers, as in #18.

It would require some work to make exceptions in the workers come up as exceptions in the main program, and we did not do this yet, but contributions are welcome

On Mon, Oct 27, 2014 at 10:42:38PM -0700, sfmatt wrote:

My apologies for the false alarm. For some reason the opam-installed version of parmap was shadowed by an older version in which the problem was not yet corrected.

Also regarding issue #18: Fatal error: exception Failure ("input_value_from_block: bad object"), it's caused by one of the children processes aborting abruptly (in my case an unexpected nan value in some complex computation causes the process to abort without warning/stack trace).

— Reply to this email directly or view it on GitHub.*

sfmatt commented 10 years ago

Alas Roberto I'm a decent debugger but a poor programmer unfortunately. The best I can do to contribute is to give you 2 simple programs to reproduce the exceptions in the latest parmap version:

_Fatal error: exception End_offile: let l = [1;2;3] let f x = exit 0 let l' = Parmap.(parmap ~ncores:2 ~chunksize:1 f (L l)) let () = List.hd l' |> print_int

_Fatal error: exception Failure("input_value_fromblock: bad object"): let l = [1;2] let f x = exit 0 let l' = Parmap.(parmap ~ncores:2 ~chunksize:1 f (L l)) let () = List.hd l' |> print_int

As you can see there are no exceptions involved in the workers only an exit call. In both examples if we replace let f x = exit 0 with let f x = failwith "FAIL" we get an explicit error message: [Parmap]: error at index j=0 in (0,0), chunksize=1 of a total of 1 got exception Failure("FAILED") on core 0 [Parmap]: error at index j=0 in (1,1), chunksize=1 of a total of 1 got exception Failure("FAILED") on core 1 [Parmap]: aborting due to exception on core 0: Failure("FAILED")

IMHO parmap deals perfectly fine with exceptions in the workers as it is. Now perhaps the same (or very similar) error messages could be used in the exit 0 scenario(s)?

Thank you again for parmap!

Matt

On Tue, Oct 28, 2014 at 12:43 AM, Roberto Di Cosmo <notifications@github.com

wrote:

Dear Matt, thanks for auto-fixing this :-)

Let me remark here that up to now parmap does not handle gracefully abnormal termination of one of the workers, as in #18.

It would require some work to make exceptions in the workers come up as exceptions in the main program, and we did not do this yet, but contributions are welcome

On Mon, Oct 27, 2014 at 10:42:38PM -0700, sfmatt wrote:

My apologies for the false alarm. For some reason the opam-installed version of parmap was shadowed by an older version in which the problem was not yet corrected.

Also regarding issue #18: Fatal error: exception Failure ("input_value_from_block: bad object"), it's caused by one of the children processes aborting abruptly (in my case an unexpected nan value in some complex computation causes the process to abort without warning/stack trace).

— Reply to this email directly or view it on GitHub.*

— Reply to this email directly or view it on GitHub https://github.com/rdicosmo/parmap/issues/29#issuecomment-60719385.