zkat / chanl

Portable channel-based concurrency for Common Lisp
Other
169 stars 18 forks source link

SEND/RECV deadlock... #11

Closed nklein closed 3 days ago

nklein commented 9 years ago

Someone posted a program to Reddit that deadlocks when using CHANL. I spent a bit of time looking into it. I think I see why it deadlocks.

The deadlock looks like a CHANL bug to me. Specifically, the use of RECV-GRABBED-VALUE-P can get into a situation where multiple SEND calls are blocking waiting for it to become true. If, while multiple SEND calls are blocking waiting for it to become true, multiple readers sneak in and all of them set it to true, then only one of the SEND calls will complete.

It seems to me this could happen even without the extra SEMAPHORE in the sample code. I think the semaphore just keeps the send threads from happening almost entirely serially.

(ql:quickload :chanl)  
(ql:quickload :bordeaux-threads)  
#+sbcl (ql:quickload :sb-concurrency)  

(defun test-chanl (n)  
  (loop for i from 0 below n do  
       (format t "~&Test: ~a~%" i)  
       (let* ((size 1024)  
              (test-repeat 4)  
              (chanl (make-instance 'chanl:unbounded-channel))  
              (semaphore #+sbcl (sb-thread:make-semaphore)  
                         #+ccl (make-semaphore)))  
         (loop repeat test-repeat do  
              (bt:make-thread  
               (lambda ()
                 #+sbcl (sb-thread:wait-on-semaphore semaphore)  
                 #+ccl (wait-on-semaphore semaphore)
                 (loop repeat size  
                    do (chanl:send chanl 1)))))  
         (loop repeat test-repeat  
            do #+sbcl (sb-thread:signal-semaphore semaphore)  
              #+ccl (signal-semaphore semaphore))  
         (loop repeat (* size test-repeat)  
            for i from 0  
            do (chanl:recv chanl))))  
  (format t "~&done.~%"))

I'll see if I can create some simpler code to trigger it. It looks like the fix would be switching to a counter (or semaphore) rather than a boolean for the number of senders waiting.

nklein commented 9 years ago

This code isn't any shorter, but it does pretty reliably reproduce the issue for thread-count of 5 or 10.

(ql:quickload :chanl)  
(ql:quickload :bordeaux-threads)  

(defun test-chanl (thread-count)
  (let ((lock (bt:make-lock "counter-lock"))
        (receivers 0)
        (senders 0)
        (channel (make-instance 'chanl:unbounded-channel))
        (start nil))
    (macrolet ((with-counter ((place) &body body)
                 `(unwind-protect
                       (progn
                         (bt:with-lock-held (lock)
                           (incf ,place))
                         ,@body)
                    (bt:with-lock-held (lock)
                      (decf ,place)))))
      (flet ((recv-func ()
               (with-counter (receivers)
                 (chanl:recv channel)))

             (send-func ()
               (with-counter (senders)
                 (loop :until start :do (bt:thread-yield))
                 (chanl:send channel :foo))))

        (let ((threads (loop :for s :from 0 :below thread-count
                          :collect (bt:make-thread #'recv-func
                                                   :name (format nil
                                                                 "READER-~D"
                                                                 s))
                          :collect (bt:make-thread #'send-func
                                                   :name (format nil
                                                                 "SENDER-~D"
                                                                 s)))))

          ;; wait until all readers have started trying to read
          (loop :until (= receivers thread-count))

          ;; wait until all writers have started
          (loop :until (= senders thread-count))

          (format t "Starting SENDs.")
          (setf start t)

          ;; wait for all threads to finish
          (loop :for thread :in threads
             :do (bt:join-thread thread)))))))
fade commented 9 years ago

On 15-11-06 01:21 PM, Patrick Stein wrote:

This code isn't any shorter, but it does pretty reliably reproduce the issue for |thread-count| of 5 or 10.

|(ql:quickload :chanl) (ql:quickload :bordeaux-threads) (defun test-chanl (thread-count) (let ((lock (bt:make-lock "counter-lock")) (receivers 0) (senders 0) (channel (make-instance 'chanl:unbounded-channel)) (start nil)) (macrolet ((with-counter ((place) &body body) `(unwind-protect (progn (bt:with-lock-held (lock) (incf ,place)) ,@body) (bt:with-lock-held (lock) (decf ,place))))) (flet ((recv-func () (with-counter (receivers) (chanl:recv channel))) (send-func () (with-counter (senders) (loop :until start :do (bt:thread-yield)) (loop :repeat repeat-count :do (chanl:send channel :foo))))) (let ((threads (loop :for s :from 0 :below thread-count :collect (bt:make-thread #'recv-func :name (format nil "READER-~D" s)) :collect (bt:make-thread #'send-func :name (format nil "SENDER-~D" s))))) ;; wait until all readers have started trying to read (loop :until (= receivers thread-count)) ;; wait until all writers have started (loop :until (= senders thread-count)) (format t "Starting SENDs.") (setf start t) ;; wait for all threads to finish (loop :for thread :in threads :do (bt:join-thread thread))))))) |

— Reply to this email directly or view it on GitHub https://github.com/zkat/chanl/issues/11#issuecomment-154491785.

There's a bug in your test function. You are looping for repeat-count :repeats, which is an undefined variable.

B

nklein commented 9 years ago

See the edit to my comment. I had repeat-count in and then took it out. Then, I accidentally undid my last modification while trying to indent everything four-spaces to paste in as Markdown code. There is no longer any repeat-count in the comment.

adlai commented 5 years ago

The deadlock looks like a CHANL bug to me. Specifically, the use of RECV-GRABBED-VALUE-P can get into a situation where multiple SEND calls are blocking waiting for it to become true. If, while multiple SEND calls are blocking waiting for it to become true, multiple readers sneak in and all of them set it to true, then only one of the SEND calls will complete.

This scenario should be invalidated for unbuffered channels by the fact that the first action in their methods is requesting the channel lock; since the bug occurs even for unbuffered channels, there must be another scenario (or the issue is due to a problem with the locking implementation).

imnisen commented 5 years ago

The send/recv for unbuffered channel causes deadlock also. You can refer to comment:https://github.com/zkat/chanl/issues/13#issuecomment-541347280

adlai commented 5 years ago

You can refer to comment:#13 (comment)

Thank you for bringing that comment to my attention; I'd likely have not noticed your message otherwise, due to the issue duplication.

The send/recv for unbuffered channel causes deadlock also.

Indeed, the Omega-th Law states that unfound bugs outnumber unimplemented tests.

adlai commented 4 years ago

This should be fixed by commit adlai/chanl@a030296 although I am leaving the issue open until the fix is confirmed.

adlai commented 4 years ago

Consideration of factors outside the immediately imported package have led to this issue's abandonment, without a confirmed fix.

adlai commented 3 days ago

I wrote a better test case and did not push to github because I work alone.