Closed nklein closed 3 days ago
This code isn't any shorter, but it does pretty reliably reproduce the issue for thread-count
of 5 or 10.
(ql:quickload :chanl)
(ql:quickload :bordeaux-threads)
(defun test-chanl (thread-count)
(let ((lock (bt:make-lock "counter-lock"))
(receivers 0)
(senders 0)
(channel (make-instance 'chanl:unbounded-channel))
(start nil))
(macrolet ((with-counter ((place) &body body)
`(unwind-protect
(progn
(bt:with-lock-held (lock)
(incf ,place))
,@body)
(bt:with-lock-held (lock)
(decf ,place)))))
(flet ((recv-func ()
(with-counter (receivers)
(chanl:recv channel)))
(send-func ()
(with-counter (senders)
(loop :until start :do (bt:thread-yield))
(chanl:send channel :foo))))
(let ((threads (loop :for s :from 0 :below thread-count
:collect (bt:make-thread #'recv-func
:name (format nil
"READER-~D"
s))
:collect (bt:make-thread #'send-func
:name (format nil
"SENDER-~D"
s)))))
;; wait until all readers have started trying to read
(loop :until (= receivers thread-count))
;; wait until all writers have started
(loop :until (= senders thread-count))
(format t "Starting SENDs.")
(setf start t)
;; wait for all threads to finish
(loop :for thread :in threads
:do (bt:join-thread thread)))))))
On 15-11-06 01:21 PM, Patrick Stein wrote:
This code isn't any shorter, but it does pretty reliably reproduce the issue for |thread-count| of 5 or 10.
|(ql:quickload :chanl) (ql:quickload :bordeaux-threads) (defun test-chanl (thread-count) (let ((lock (bt:make-lock "counter-lock")) (receivers 0) (senders 0) (channel (make-instance 'chanl:unbounded-channel)) (start nil)) (macrolet ((with-counter ((place) &body body) `(unwind-protect (progn (bt:with-lock-held (lock) (incf ,place)) ,@body) (bt:with-lock-held (lock) (decf ,place))))) (flet ((recv-func () (with-counter (receivers) (chanl:recv channel))) (send-func () (with-counter (senders) (loop :until start :do (bt:thread-yield)) (loop :repeat repeat-count :do (chanl:send channel :foo))))) (let ((threads (loop :for s :from 0 :below thread-count :collect (bt:make-thread #'recv-func :name (format nil "READER-~D" s)) :collect (bt:make-thread #'send-func :name (format nil "SENDER-~D" s))))) ;; wait until all readers have started trying to read (loop :until (= receivers thread-count)) ;; wait until all writers have started (loop :until (= senders thread-count)) (format t "Starting SENDs.") (setf start t) ;; wait for all threads to finish (loop :for thread :in threads :do (bt:join-thread thread))))))) |
— Reply to this email directly or view it on GitHub https://github.com/zkat/chanl/issues/11#issuecomment-154491785.
There's a bug in your test function. You are looping for repeat-count :repeats, which is an undefined variable.
B
See the edit to my comment. I had repeat-count
in and then took it out. Then, I accidentally undid my last modification while trying to indent everything four-spaces to paste in as Markdown code. There is no longer any repeat-count
in the comment.
The deadlock looks like a CHANL bug to me. Specifically, the use of RECV-GRABBED-VALUE-P can get into a situation where multiple SEND calls are blocking waiting for it to become true. If, while multiple SEND calls are blocking waiting for it to become true, multiple readers sneak in and all of them set it to true, then only one of the SEND calls will complete.
This scenario should be invalidated for unbuffered channels by the fact that the first action in their methods is requesting the channel lock; since the bug occurs even for unbuffered channels, there must be another scenario (or the issue is due to a problem with the locking implementation).
The send/recv
for unbuffered channel causes deadlock also.
You can refer to comment:https://github.com/zkat/chanl/issues/13#issuecomment-541347280
You can refer to comment:#13 (comment)
Thank you for bringing that comment to my attention; I'd likely have not noticed your message otherwise, due to the issue duplication.
The send/recv for unbuffered channel causes deadlock also.
Indeed, the Omega-th Law states that unfound bugs outnumber unimplemented tests.
This should be fixed by commit adlai/chanl@a030296 although I am leaving the issue open until the fix is confirmed.
Consideration of factors outside the immediately imported package have led to this issue's abandonment, without a confirmed fix.
I wrote a better test case and did not push to github because I work alone.
Someone posted a program to Reddit that deadlocks when using
CHANL
. I spent a bit of time looking into it. I think I see why it deadlocks.The deadlock looks like a
CHANL
bug to me. Specifically, the use ofRECV-GRABBED-VALUE-P
can get into a situation where multipleSEND
calls are blocking waiting for it to become true. If, while multipleSEND
calls are blocking waiting for it to become true, multiple readers sneak in and all of them set it to true, then only one of theSEND
calls will complete.It seems to me this could happen even without the extra
SEMAPHORE
in the sample code. I think the semaphore just keeps the send threads from happening almost entirely serially.I'll see if I can create some simpler code to trigger it. It looks like the fix would be switching to a counter (or semaphore) rather than a boolean for the number of senders waiting.