zeromq / zbroker

Elastic pipes
Mozilla Public License 2.0
50 stars 11 forks source link

Problem: writers block after reader attaches #70

Closed rpedde closed 10 years ago

rpedde commented 10 years ago

Two separate processes. One writes, the other reads. The reader attaches first, then blocks on a read. The writer connects next, the local writer attaches, but the pipe open apparently hangs until the reader disconnects.

I'm running these processes in python, and I don't think this is python related, but I'm writing a repro case in C to be sure. I'll follow with a C repro case if I duplicate it.

Error in test "scripts/blocks_are_streamish.yml"

Host "writer" (localhost)

Test script

sleep 1
timeout 0
open pipe1 write
write pipe1 abcdef
close pipe1 write

Script Log

Set prefix to 1
Set broker to 1-0
Sleeping for 1 seconds
Set timeout to 0
Opening descriptor "1-0|>1-pipe1"
Opened pipe "pipe1" for write
Writing "abcdef" to pipe "pipe1"
Wrote to pipe "pipe1"
Closing write pipe "pipe1"
Closed pipe "pipe1"
Test passed

Broker Log

zbroker service/0.0.1
Copyright (c) 2014 the Contributors
This Software is provided under the MPLv2 License on an "as is" basis,
without warranty of any kind, either expressed, implied, or statutory.

14-05-06 14:56:47 I: starting zpipes broker using config in '/tmp/tmpj6I1nw/zbroker.cfg'
14-05-06 14:56:47 I: joining cluster as B92C1C
14-05-06 14:56:47 N: starting zpipes_server service
14-05-06 14:56:47 I: ZPIPES server appeared at 20CAC8
14-05-06 14:56:49 D: remote=20CAC8 command=HAVE READER pipe=1-pipe1 unicast=0
14-05-06 14:56:49 1-pipe1: attach remote reader
14-05-06 14:57:00 D:     70:                    : start:
14-05-06 14:57:00 D:     70:                    :     OUTPUT
14-05-06 14:57:00 D:     70:                    :         $ lookup or create pipe
14-05-06 14:57:00 D:     70:1-pipe1             :         $ open pipe writer
14-05-06 14:57:00 1-pipe1: attach local writer
14-05-06 14:57:00 1-pipe1: tell peer we are now writer
14-05-06 14:57:00 D:     70:1-pipe1             : open local writer
14-05-06 14:57:00 D:     70:1-pipe1             :         > before writing
14-05-06 14:57:00 D:     70:1-pipe1             : before writing:
14-05-06 14:57:00 D:     70:1-pipe1             :     ok
14-05-06 14:57:00 D:     70:1-pipe1             :         $ send OUTPUT_OK
14-05-06 14:57:00 D:     70:1-pipe1             :         > writing
14-05-06 14:57:00 D:     70:1-pipe1             : writing:
14-05-06 14:57:00 D:     70:1-pipe1             :     WRITE
14-05-06 14:57:00 D:     70:1-pipe1             :         $ process write request
14-05-06 14:57:00 D:     70:1-pipe1             :         > processing write
14-05-06 14:57:00 D:     70:1-pipe1             : processing write:
14-05-06 14:57:00 D:     70:1-pipe1             :     have reader
14-05-06 14:57:00 D:     70:1-pipe1             :         $ pass data to reader
14-05-06 14:57:00 D:     70:1-pipe1             : write 6 bytes
14-05-06 14:57:00 D:     70:1-pipe1             :         $ send WRITE_OK
14-05-06 14:57:00 D:     70:1-pipe1             :         > writing
14-05-06 14:57:00 D:     70:1-pipe1             : writing:
14-05-06 14:57:00 D:     70:1-pipe1             :     CLOSE
14-05-06 14:57:00 D:     70:1-pipe1             :         $ close pipe writer
14-05-06 14:57:00 1-pipe1: tell peer we stopped being writer
14-05-06 14:57:00 D:     70:1-pipe1             : close local writer
14-05-06 14:57:00 D:     70:1-pipe1             :         $ send CLOSE_OK
14-05-06 14:57:00 D:     70:1-pipe1             :         > start
interrupted
14-05-06 14:57:01 N: terminating zpipes_server service

Host "reader" (localhost)

Test script

sleep 1
timeout 0
open pipe1 read
read pipe1 2 ab
read pipe1 2 cd
read pipe1 2 ef
close pipe1 read

Script Log

Set prefix to 1
Set broker to 1-1
Sleeping for 1 seconds
Set timeout to 0
Opening descriptor "1-1|1-pipe1"
Opened pipe "pipe1" for read
Reading 2 bytes from pipe "pipe1"
Unexpected exception: timeouterror
Test failed

Broker Log

zbroker service/0.0.1
Copyright (c) 2014 the Contributors
This Software is provided under the MPLv2 License on an "as is" basis,
without warranty of any kind, either expressed, implied, or statutory.

14-05-06 14:56:47 I: starting zpipes broker using config in '/tmp/tmp9wrgD2/zbroker.cfg'
14-05-06 14:56:47 I: joining cluster as 20CAC8
14-05-06 14:56:47 N: starting zpipes_server service
14-05-06 14:56:47 I: ZPIPES server appeared at B92C1C
14-05-06 14:56:49 D:     70:                    : start:
14-05-06 14:56:49 D:     70:                    :     INPUT
14-05-06 14:56:49 D:     70:                    :         $ lookup or create pipe
14-05-06 14:56:49 D:     70:1-pipe1             :         $ open pipe reader
14-05-06 14:56:49 1-pipe1: attach local reader
14-05-06 14:56:49 1-pipe1: broadcast we are now reader
14-05-06 14:56:49 D:     70:1-pipe1             : open local reader
14-05-06 14:56:49 D:     70:1-pipe1             :         > before reading
14-05-06 14:56:49 D:     70:1-pipe1             : before reading:
14-05-06 14:56:49 D:     70:1-pipe1             :     ok
14-05-06 14:56:49 D:     70:1-pipe1             :         $ send INPUT_OK
14-05-06 14:56:49 D:     70:1-pipe1             :         > reading
14-05-06 14:56:49 D:     70:1-pipe1             : reading:
14-05-06 14:56:49 D:     70:1-pipe1             :     READ
14-05-06 14:56:49 D:     70:1-pipe1             :         $ process read request
14-05-06 14:56:49 D:     70:1-pipe1             :         > processing read
interrupted
14-05-06 14:56:58 N: terminating zpipes_server service
hintjens commented 10 years ago

Very strange. I can't think of any reason that the one process would affect the other like this. Certainly the 20CAC8 brokre is active (and handling messages). I suspect it's some unexpected synchronization in the test case.

What I do see is that the writer still thinks the reader exists, when it's in fact gone. I need to still add some cleanup of dead pipes. We do know in Zyre when a peer disappears, so I can use that to wipe out dangling readers and writers.

rpedde commented 10 years ago

It would be worthwhile to delete pipes on close as well -- I suspect some things I see are a result of old "stale" pipes that have been closed both sides, but are still in the "pending timeout" state.

hintjens commented 10 years ago

Pipes are deleted on close, when known. They're removed from the hash tables when the writer closes, even if the reader is still busy. There's just some leakage with remote peers being interrupted, I think.

On Thu, May 8, 2014 at 1:00 AM, Ron Pedde notifications@github.com wrote:

It would be worthwhile to delete pipes on close as well -- I suspect some things I see are a result of old "stale" pipes that have been closed both sides, but are still in the "pending timeout" state.

— Reply to this email directly or view it on GitHubhttps://github.com/zeromq/zbroker/issues/70#issuecomment-42494008 .

rpedde commented 10 years ago

Okay, so these:

14-05-07 18:35:16 D:    177:write_test          : close local reader
14-05-07 18:35:16 D:    177:write_test          :         $ send CLOSE_OK
14-05-07 18:35:16 D:    177:write_test          :         > start
14-05-07 18:35:16 D:    178:write_test          : writing:
14-05-07 18:35:16 D:    178:write_test          :     CLOSE
14-05-07 18:35:16 D:    178:write_test          :         $ close pipe writer
14-05-07 18:35:16 D:    178:write_test          : close local writer
14-05-07 18:35:16 D:    178:write_test          :         $ send CLOSE_OK
14-05-07 18:35:16 D:    178:write_test          :         > start
14-05-07 18:35:26 D:    177:write_test          : start:
14-05-07 18:35:26 D:    177:write_test          :     expired
14-05-07 18:35:26 D:    177:write_test          :         $ terminate
14-05-07 18:35:26 D:    178:write_test          : start:
14-05-07 18:35:26 D:    178:write_test          :     expired
14-05-07 18:35:26 D:    178:write_test          :         $ terminate

The terminates? The hash and data queues are already dropped, the terminate is just a waiting timer that doesn't actually do anything?

hintjens commented 10 years ago

Right. The 'expired' event hits after 10 seconds (default timeout on client connections). The 'terminate' action wipes that client connection. At this stage the pipes are already closed and dropped. If 'expired' hits in a state with an open pipe, that are first closed.

rpedde commented 10 years ago

I haven't seen this again. I'll reopen if I see this again.

hintjens commented 10 years ago

OK, great.

On Thu, May 8, 2014 at 5:13 PM, Ron Pedde notifications@github.com wrote:

Closed #70 https://github.com/zeromq/zbroker/issues/70.

— Reply to this email directly or view it on GitHubhttps://github.com/zeromq/zbroker/issues/70#event-119171945 .