rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
http://rdicosmo.github.io/parmap/
Other
94 stars 20 forks source link

Fatal error: exception End_of_file #10

Closed horasio closed 11 years ago

horasio commented 11 years ago

Hi,

I'm using Mac OS X 10.7.5, ocaml 4.00.1 from Macports, and I just git-cloned & compiled parmap. It compiles and installs fine. Then, I compile and try the mandels example:

$ ./mandels.native Computing...Got task... Fatal error: exception End_of_file Fatal error: exception End_of_file

On my own program, the same error happens. During its compilation, I get: findlib: [WARNING] Interface myocamlbuild.cmi occurs in several directories: ., /opt/local/lib/ocaml/site-lib/parmap (which I don't think is related to the crash, but just in case...) Many thanks in advance for advises on how to solve this crash. sam

rdicosmo commented 11 years ago

Dear Sam, just tried exactly the same test, with OCaml 4.00.1, and I cannot reproduce the bug on my machine (ia64, Linux).

Could you try to install parmap using opam (see https://github.com/OCamlPro/opam) on your machine, and perform the tests with various versions of the compiler (easy to do with opam)?

Roberto

On Fri, Feb 15, 2013 at 02:04:32PM -0800, horasio wrote:

Hi,

I'm using Mac OS X 10.7.5, ocaml 4.00.1 from Macports, and I just git-cloned & compiled parmap. It compiles and installs fine. Then, I compile and try the mandels example:

$ ./mandels.native Computing...Got task... Fatal error: exception End_of_file Fatal error: exception End_of_file

On my own program, the same error happens. During its compilation, I get: findlib: [WARNING] Interface myocamlbuild.cmi occurs in several directories: ., /opt/local/lib/ocaml/site-lib/parmap (which I don't think is related to the crash, but just in case...)

Many thanks in advance for advises on how to solve this crash.

sam

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

horasio commented 11 years ago

Dear Roberto, Thank you for your suggestion. I've just tried to compile the mandelbrot example with parmap.1.0-rc1 and ocaml-4.00.1 and ocaml-3.12.1 (all compiled with OPAM). The binary fails in the same way :-( It might be related to weirdnesses of Mac OS X ?

rdicosmo commented 11 years ago

Dear Sam, at this point, I would say that is definitely a Mac OS X problem... but we did not see problems on Mac OS X when we shipped v 1.0.

Maybe Marco (in Cc:) might try and see what it happens on Mac OS X? I have no Mac OS X machine at hand right now, so any other Mac OS X user on the list is welcome to test and report (please let us know exactly the OS version user)

Roberto

On Sat, Feb 16, 2013 at 09:12:56AM -0800, horasio wrote:

Dear Roberto, Thank you for your suggestion. I've just tried to compile the mandelbrot example with parmap.1.0-rc1 and ocaml-4.00.1 and ocaml-3.12.1 (all compiled with OPAM). The binary fails in the same way :-( It might be related to weirdnesses of Mac OS X ?

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

UnixJunkie commented 11 years ago

The examples require several libraries: ocamlgraph and some graphical thing that I don't remember the name. But it would not compile if those are missing.

horasio commented 11 years ago

"mandels.ml" compiles fine. When I start it, the X-server opens a window, and then ... parmap crashes. Anyway, here is a minimal program which crashes if you replace 29 by 30 (Mac OS X 10.7.5, MacPorts's ocaml 4.00.1, parmap 0.9.9 from github) I compile it with ocamlfind ocamlopt -package parmap -linkpkg parmapcrash.ml).

let range n =
  assert (n>=0);
  let rec rangeaux acc = function
    | 0 -> acc
    | n -> rangeaux (n::acc) (n-1)
  in rangeaux [] n
let l = range 29 (* CRASHES with [range 30] *)
let f x = x + 1
let l' = Parmap.(parmap ~ncores:2 ~chunksize:10 f (L l))
let () = List.iter (fun x -> Printf.printf "%d " x) l'
horasio commented 11 years ago

Maybe a hint : when I change the value of chunksize (CK=10 in the above program), it crashes if the list has 3_CK elements but works fine with 3_CK-1 elements or less.

rdicosmo commented 11 years ago

Ahhhh... ok :-)

Parmap 0.9.9 had a bug with exactly these consequences.

You must get parmap 1.0-rc1 (which is also available in opam), and not parmap 0.9.9

Can you check if this solves the problem?

On Mon, Feb 18, 2013 at 12:48:26AM -0800, horasio wrote:

"mandels.ml" compiles fine. When I start it, the X-server opens a window, and then ... parmap crashes. Anyway, here is a minimal problem which crashes if you replace 29 by 30 (Mac OS X 10.7.5, MacPorts's ocaml 4.00.1, parmap 0.9.9 from github, compiled with ocamlfind ocamlopt -package parmap -linkpkg parmapcrash.ml).

let range n = assert (n>=0); let rec rangeaux acc = function | 0 -> acc | n -> rangeaux (n::acc) (n-1) in rangeaux [] n let l = range 29 (* CRASHES with [range 30] *) let f x = x + 1 let l' = Parmap.(parmap ~ncores:2 ~chunksize:10 f (L l)) let () = List.iter (fun x -> Printf.printf "%d " x) l'

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

horasio commented 11 years ago

Well, the META file indicates 0.9.9 but my git HEAD points to 1.0-rc1. Which should I believe ? I have trouble locating the bug fix that was added post 0.9.9 in the git history.

rdicosmo commented 11 years ago

Ok, my fault, I will push a version with the correct info in the META file, and we are back to the drawing board then: there was an error in the logic of the system that made parmap crash when using more cores than elements in a list for example, that was fixed (and is captured by one of the tests), but if you have 1.0-rc1, that should be already fixed.

On Mon, Feb 18, 2013 at 07:14:22AM -0800, horasio wrote:

Well, the META file indicates 0.9.9 but my git HEAD points to 1.0-rc1. Which should I believe ?

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

horasio commented 11 years ago

OK for the bad META. I got a backtrace from the crash that could help:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff821ee4ab memcpy + 273 1 libsystem_kernel.dylib 0x00007fff821fe2f3 thread_policy_set + 118 2 a.out 0x00000001050d850d setcore + 109 3 a.out 0x00000001050b16c6 .L413 + 34

rdicosmo commented 11 years ago

This seems to indicate trouble when calling the Mac OS X core pinning interface. To verify, can you just replace in setcore_stubs.c the original code with

CAMLprim value setcore(value which) { return Val_unit; }

(effectively avoiding to call the thread_policy_set code) and see if the problem is still there?

Roberto

2013/2/18 horasio notifications@github.com

OK for the bad META. I got a backtrace from the crash that could help:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff821ee4ab memcpy + 273 1 libsystem_kernel.dylib 0x00007fff821fe2f3 thread_policy_set + 118 2 a.out 0x00000001050d850d setcore + 109 3 a.out 0x00000001050b16c6 .L413 + 34

— Reply to this email directly or view it on GitHubhttps://github.com/rdicosmo/parmap/issues/10#issuecomment-13727263.

--Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann F-75205 Paris Cedex 13

FRANCE.

Attachments: MIME accepted

Word deprecated, http://www.rfc1149.net/documents/whynotword

Office location:

Bureau 6C15 (6th floor) 175, rue du Chevaleret, XIII

Metro Chevaleret, ligne 6

horasio commented 11 years ago

OK, here is the fix. Thank you for you help :-)

diff --git a/setcore_stubs.c b/setcore_stubs.c
index 86570fa..23a597c 100644
--- a/setcore_stubs.c
+++ b/setcore_stubs.c
@@ -13,6 +13,7 @@
 #include <errno.h>
 #include <caml/mlvalues.h>

+
 CAMLprim value setcore(value which) {
   int numcores = sysconf( _SC_NPROCESSORS_ONLN );
   int w = Int_val(which) % numcores; // stay in the space of existing cores
@@ -40,7 +41,7 @@ CAMLprim value setcore(value which) {
       affinityData.affinity_tag = w;
       retcode = thread_policy_set(mach_thread_self(),
                         THREAD_AFFINITY_POLICY,
-                        affinityData,
+                        &affinityData,
                         THREAD_AFFINITY_POLICY_COUNT);
       if(retcode) {
         fprintf(stderr,"MAC OS X: Failed pinning to cpu %d, trying %d/2\n",w, w);
rdicosmo commented 11 years ago

Thanks a lot!

May you give me your real e-mail, so I can add this fix with you as author?

On Mon, Feb 18, 2013 at 08:01:21AM -0800, horasio wrote:

OK, here is the fix :-)

diff --git a/setcore_stubs.c b/setcore_stubs.c index 86570fa..23a597c 100644 --- a/setcore_stubs.c +++ b/setcore_stubs.c @@ -13,6 +13,7 @@

include

include <caml/mlvalues.h>

+ CAMLprim value setcore(value which) { int numcores = sysconf( _SC_NPROCESSORS_ONLN ); int w = Int_val(which) % numcores; // stay in the space of existing cores @@ -40,7 +41,7 @@ CAMLprim value setcore(value which) { affinityData.affinity_tag = w; retcode = thread_policy_set(mach_thread_self(), THREAD_AFFINITY_POLICY,

  • affinityData,
  • &affinityData, THREAD_AFFINITY_POLICY_COUNT); if(retcode) { fprintf(stderr,"MAC OS X: Failed pinning to cpu %d, trying %d/2\n",w, w);

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

horasio commented 11 years ago

What an honor for a one byte fix ;-) samuel.hornus@gmail.com (I can now use Parmap.pariter for finger-in-the-nose load-balanced image batch rendering, thanks for this library!)

rdicosmo commented 11 years ago

But how much work for this one byte ;-)

Thanks, this closes #10

On Mon, Feb 18, 2013 at 08:26:18AM -0800, horasio wrote:

What an honor for a one byte fix ;-) samuel.hornus@gmail.com

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

rdicosmo commented 11 years ago

I switched back to Linux, I still have a working Mac but not with the most recent operating system version. I'll have a look at what's going on. marcod

On 02/16/2013 10:06 PM, Roberto Di Cosmo wrote:

Dear Sam, at this point, I would say that is definitely a Mac OS X problem... but we did not see problems on Mac OS X when we shipped v 1.0.

Maybe Marco (in Cc:) might try and see what it happens on Mac OS X? I have no Mac OS X machine at hand right now, so any other Mac OS X user on the list is welcome to test and report (please let us know exactly the OS version user)

Roberto

On Sat, Feb 16, 2013 at 09:12:56AM -0800, horasio wrote:

Dear Roberto, Thank you for your suggestion. I've just tried to compile the mandelbrot example with parmap.1.0-rc1 and ocaml-4.00.1 and ocaml-3.12.1 (all compiled with OPAM). The binary fails in the same way :-( It might be related to weirdnesses of Mac OS X ?

— Reply to this email directly or view it on GitHub.

*

-- Marco Danelutto, Dept. Computer Science, Univ. of Pisa, Italy -- Web: www.di.unipi.it/~marcod Ph: +390502212742 Fax: +390502212726

UnixJunkie commented 11 years ago

Hi,

I have a Mac at home. I confirm ./mandels.native work on a fresh git clone of parmap.

rdicosmo commented 11 years ago

Thanks Francois! we have now double confirmation that this bug has gone away.

On Tue, Feb 19, 2013 at 07:30:06AM -0800, Francois Berenger wrote:

Hi,

I have a Mac at home. I confirm ./mandels.native work on a fresh git clone of parmap.

— Reply to this email directly or view it on GitHub.

*

Roberto Di Cosmo


Professeur En delegation a l'INRIA PPS E-mail: roberto@dicosmo.org Universite Paris Diderot WWW : http://www.dicosmo.org Case 7014 Tel : ++33-(0)1-57 27 92 20 5, Rue Thomas Mann
F-75205 Paris Cedex 13 Identica: http://identi.ca/rdicosmo

FRANCE. Twitter: http://twitter.com/rdicosmo

Attachments: MIME accepted, Word deprecated

http://www.gnu.org/philosophy/no-word-attachments.html

Office location:

Bureau 320 (3rd floor) Batiment Sophie Germain Avenue de France

Metro Bibliotheque Francois Mitterrand, ligne 14/RER C

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3