lmj / lparallel

Parallelism for Common Lisp
http://lparallel.org
BSD 3-Clause "New" or "Revised" License
243 stars 29 forks source link

Allegro Lisp 9.0 Compatibility #3

Closed simonlast closed 12 years ago

simonlast commented 12 years ago

I'm trying to run the tests and benchmarks in Allegro 9.0, but I am getting several errors

For the tests: Debug: Attempt to do an array operation on 0 which is not an array.

For the benchmarks: Error: No methods applicable for generic function # with args (NIL) of classes (NULL)

lmj commented 12 years ago

Thanks for the info. I've sent a request to Franz for 9.0 Express Edition (if it exists yet).

What platform is this? And could you tell me the test it failed on? It should print FOO-TEST for some FOO.

lmj commented 12 years ago

It turns out that Allegro 8.2 had been broken. It's now fixed -- perhaps 9.0 is also fixed?

simonlast commented 12 years ago

It seems to work a bit better, but the process hangs during the final benchmark, MATRIX_MUL Here's a trace:

Thread 9 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100228015 in md_lock_wait ()

2 0x00000001002280ad in smp_thread_get ()

3 0x00000001002285d8 in wait_acl_gate ()

4 0x000000010025f1ac in op_wait_gate ()

5 0x6af6310000000f00 in ?? ()

Thread 8 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100228015 in md_lock_wait ()

2 0x00000001002280ad in smp_thread_get ()

3 0x00000001002285d8 in wait_acl_gate ()

4 0x000000010025f1ac in op_wait_gate ()

5 0x6af6310000000f00 in ?? ()

Thread 7 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100228015 in md_lock_wait ()

2 0x00000001002280ad in smp_thread_get ()

3 0x0000000100260d99 in thread_get ()

4 0x6fd4210000000f00 in ?? ()

Thread 6 (process 62284):

0 0x00007fff858eadf2 in select$DARWIN_EXTSN ()

1 0x0000000100201f93 in c_mpwaitsigio ()

2 0x000000010025b319 in mpwaitsigio ()

3 0x6fd0c10000000f00 in ?? ()

Thread 5 (process 62284):

0 0x00007fff858eadf2 in select$DARWIN_EXTSN ()

1 0x0000000100201f93 in c_mpwaitsigio ()

2 0x000000010025b319 in mpwaitsigio ()

3 0x6fd0c10000000f00 in ?? ()

Thread 4 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100228015 in md_lock_wait ()

2 0x00000001002280ad in smp_thread_get ()

3 0x0000000100260d99 in thread_get ()

4 0x000af90000000f00 in ?? ()

Thread 3 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100227ff7 in sem_wait_posix ()

2 0x000000010022a8b8 in oversee_signal_dispatch ()

3 0x000000010022a96f in smp_scavenge_controller ()

4 0x00007fff904288bf in _pthread_start ()

5 0x00007fff9042bb75 in thread_start ()

Thread 2 (process 62284):

0 0x00007fff858e967a in mach_msg_trap ()

1 0x00007fff858e8d71 in mach_msg ()

2 0x000000010021b467 in lisp_exception_watcher ()

3 0x00007fff904288bf in _pthread_start ()

4 0x00007fff9042bb75 in thread_start ()

Thread 1 (process 62284):

0 0x00007fff858e96b6 in semaphore_wait_trap ()

1 0x0000000100228015 in md_lock_wait ()

2 0x00000001002280ad in smp_thread_get ()

---Type to continue, or q to quit---

3 0x00000001002285d8 in wait_acl_gate ()

4 0x000000010025f1ac in op_wait_gate ()

simonlast commented 12 years ago

It also hangs during the COGNATE-HANDLER-TEST, with a similar trace

lmj commented 12 years ago

Based on the trace you posted, it appears that OS threads are not enabled (threads are on the same process). If so then the PMATRIX-MUL benchmark is probably not hanging, but merely taking a long time.

On my copy of Allegro 8.2 Express Edition, which does not have OS threads, the benchmark does eventually finish after about 20 minutes. This is OK because defpun serves no purpose without true multiprocessing available.

I suspect the slowdown is caused by the work-stealing loop (sync in Cilk) which can briefly spin under certain situations. However for Allegro's green (non-OS) threads the spinning is not brief; it takes over the whole process for long periods of time, effectively pausing the computation.

Perhaps for Lisps without SMP, defpun should expand to defun and (inside defpun) plet to let. Though I wonder if non-SMP Lisps would be used with lparallel in the first place.

COGNATE-HANDLER-TEST may be unrelated. Allegro 8.2 has no problem with (loop (lparallel-test::cognate-handler-test)). I'm waiting for Franz to reply to my 9.0 request.

simonlast commented 12 years ago

I think OS threads must be enabled, because initially the program uses more than one processor, but during the PMATRIX-MUL benchmark, processor usage drops to 0, so I don't think it is simply taking a long time.

lmj commented 12 years ago

Maybe there is some explanation for why your trace shows waiting threads attached to the same process. What does (find :os-threads features) say?

In any case, I can't do much without being able to run 9.0. (Franz hasn't responded to my request yet.) Who knows, maybe bordeaux-threads needs updating for 9.0.

simonlast commented 12 years ago

(find :os-threads features) says :OS-THREADS

lmj commented 12 years ago

With the following patch for bordeaux-threads, Allegro 9.0beta SMP successfully passes all lparallel tests. I'll submit it once I get confirmation from Franz.

diff --git a/src/impl-allegro.lisp b/src/impl-allegro.lisp
index 144ee98..102200c 100644
--- a/src/impl-allegro.lisp
+++ b/src/impl-allegro.lisp
@@ -41,12 +41,12 @@ Distributed under the MIT license (see LICENSE file)

 (defun condition-wait (condition-variable lock)
   (release-lock lock)
-  (mp:process-wait "wait for message" #'mp:gate-open-p condition-variable)
-  (acquire-lock lock)
-  (mp:close-gate condition-variable))
+  (unwind-protect
+       (mp:get-semaphore condition-variable)
+    (acquire-lock lock)))

 (defun condition-notify (condition-variable)
-  (mp:open-gate condition-variable))
+  (mp:put-semaphore condition-variable))

 (defun thread-yield ()
   (mp:process-allow-schedule))
simonlast commented 12 years ago

I added that patch to bordeaux-threads, and cloned the latest lparallel, but I am again getting the same error as before: Debug: Attempt to do an array operation on 0 which is not an array., on both the benchmark and the tests

lmj commented 12 years ago

What is your lisp-implementation-version? Tests and benchmarks run fine under

9.0.pre-final.18 [Linux (x86) *SMP*] (Jun 6, 2012 8:36)
simonlast commented 12 years ago
9.0.beta.21 \[64-bit Mac OS X (Intel) *SMP*\] (Apr 12, 2012 16:53)

Perhaps this version is too old?

lmj commented 12 years ago

Getting the latest can't hurt. Unfortunately I don't have a Mac that can run Allegro 9.0, at least currently.

lmj commented 12 years ago

I don't know if this is related to your problem, but there is an Allegro 9.0 bug affecting bordeaux-threads which is currently unresolved. It is mentioned at the end of http://lists.common-lisp.net/pipermail/bordeaux-threads-devel/2012-June/000204.html

After applying the patches in the link, see if (loop (5am:debug! 'bordeaux-threads-test::stress-test)) hangs or produces an error. If you would rather not wait for Franz to fix it, you could try removing the *thread-results* hash in impl-allegro.lisp in bordeaux-threads.

lmj commented 12 years ago

I could upgrade my Mac to 10.6, but I would prefer knowing beforehand that a problem still exists. If the very latest Allegro 9.0 for Mac still fails with the above patch and the following patch, then I'll know.

diff --git a/src/impl-allegro.lisp b/src/impl-allegro.lisp
index d9ea53b..0432690 100644
--- a/src/impl-allegro.lisp
+++ b/src/impl-allegro.lisp
@@ -55,18 +55,8 @@ Distributed under the MIT license (see LICENSE file)
 (defun start-multiprocessing ()
   (mp:start-scheduler))

-(defvar *thread-results* (make-hash-table :weak-keys t))
-
-(defvar *thread-join-lock* (make-lock "Bordeaux threads join lock"))
-
 (defun %make-thread (function name)
-  (mp:process-run-function
-   name
-   (lambda ()
-     (let ((result (funcall function)))
-       (with-lock-held (*thread-join-lock*)
-         (setf (gethash (current-thread) *thread-results*)
-               result))))))
+  (mp:process-run-function name function))

 (defun current-thread ()
   mp:*current-process*)
@@ -102,10 +92,6 @@ Distributed under the MIT license (see LICENSE file)
 (defun join-thread (thread)
   (mp:process-wait (format nil "Waiting for thread ~A to complete" thread)
                    (complement #'mp:process-alive-p)
-                   thread)
-  (with-lock-held (*thread-join-lock*)
-    (prog1
-        (gethash thread *thread-results*)
-      (remhash thread *thread-results*))))
+                   thread))

 (mark-supported)
simonlast commented 12 years ago

With those 3 patches, I still get the same errors. I'm going to try to get the latest ACL soon

lmj commented 12 years ago

Any word on this? I am currently unable to test 9.0 SMP because my beta license has expired. The latest bordeaux-threads in the repository uses built-in condition variables, which should be more robust.

simonlast commented 12 years ago

I'll forward this to someone who could test it. I no longer have Allegro Lisp

asmyers commented 12 years ago

Hi lmj, Simon was an intern we had investigating SMP support in ACL 9.0 over the summer. Using the versions of lparallel and bordeaux-threads on github all lparallel tests pass using x64 ACL 9.0 on Linux and OSX Mountain Lion, this wasn't the case when Simon was investigating lparallel. The versions installed via quicklisp still segfault when running the lparallel test suite. I've also used lparallel a few places experimentally and everything as worked well using the code on github.

Thanks for your great work, I really like the library! Andrew

lmj commented 12 years ago

Thanks for the update. I had planned on contacting Franz once quicklisp got the new bordeaux-threads (less hassle for them), but it looks resolved now.