zeromq / zbroker

Elastic pipes
Mozilla Public License 2.0
50 stars 11 forks source link

Problem: rapid writer opens asserts #60

Closed rpedde closed 10 years ago

rpedde commented 10 years ago

This might be related to the other assert, as it's in the same pattern:

open writer on hosta open corresponding reader on hostb open new writer on hostb

write to new writer handle, reader asserts:

zpipes_server.c:417: pipe_send_data: Assertion `self->reader' failed.
Aborted

Writer animation:

14-05-02 17:14:23 I: starting zpipes server using config in '/etc/zbroker.cfg'
14-05-02 17:14:23 I: joining cluster as DEC3D1B4C98A5003B10285003C1CD50F
14-05-02 17:14:23 N: starting zpipes_server service
14-05-02 17:14:23 N: binding zpipes service to 'ipc://@/zpipes/local'
14-05-02 17:14:33 I: ZPIPES server appeared at 64ECAC5F2DDC5D91320A6CD4AA3079A4
14-05-02 17:14:42      0: start:
14-05-02 17:14:42      0:     OUTPUT
14-05-02 17:14:42      0:         $ lookup or create pipe
14-05-02 17:14:42      0:         $ open pipe writer
14-05-02 17:14:42      0:         > before writing
14-05-02 17:14:42      0: before writing:
14-05-02 17:14:42      0:     ok
14-05-02 17:14:42      0:         $ send OUTPUT_OK
14-05-02 17:14:42      0:         > writing
14-05-02 17:14:42      1: start:
14-05-02 17:14:42      1:     OUTPUT
14-05-02 17:14:42      1:         $ lookup or create pipe
14-05-02 17:14:42      1:         $ open pipe writer
14-05-02 17:14:42      1:         > before writing
14-05-02 17:14:42      1: before writing:
14-05-02 17:14:42      1:     ok
14-05-02 17:14:42      1:         $ send OUTPUT_OK
14-05-02 17:14:42      1:         > writing
14-05-02 17:14:42      1: writing:
14-05-02 17:14:42      1:     WRITE
14-05-02 17:14:42      1:         $ process write request
14-05-02 17:14:42      1:         > processing write
14-05-02 17:14:42      1: processing write:
14-05-02 17:14:42      1:     have reader
14-05-02 17:14:42      1:         $ pass data to reader
14-05-02 17:14:42      1:         $ send WRITE_OK
14-05-02 17:14:42      1:         > writing
14-05-02 17:14:42      1: writing:
14-05-02 17:14:42      1:     CLOSE
14-05-02 17:14:42      1:         $ close pipe writer
14-05-02 17:14:42      1:         $ send CLOSE_OK
14-05-02 17:14:42      1:         > start
14-05-02 17:14:52      0: writing:
14-05-02 17:14:52      0:     expired
14-05-02 17:14:52      0:         $ close pipe writer
14-05-02 17:14:52      0:         $ terminate
14-05-02 17:14:52      1: start:
14-05-02 17:14:52      1:     expired
14-05-02 17:14:52      1:         $ terminate
14-05-02 17:14:52 I: ZPIPES server vanished from 64ECAC5F2DDC5D91320A6CD4AA3079A4

Reader animation:

14-05-02 17:14:33 I: starting zpipes server using config in 'zbroker.cfg'
14-05-02 17:14:33 I: joining cluster as 64ECAC5F2DDC5D91320A6CD4AA3079A4
14-05-02 17:14:33 N: starting zpipes_server service
14-05-02 17:14:33 N: binding zpipes service to 'ipc://@/zpipes/local'
14-05-02 17:14:33 I: ZPIPES server appeared at DEC3D1B4C98A5003B10285003C1CD50F
14-05-02 17:14:42    643: start:
14-05-02 17:14:42    643:     INPUT
14-05-02 17:14:42    643:         $ lookup or create pipe
14-05-02 17:14:42    643:         $ open pipe reader
14-05-02 17:14:42    643:         > before reading
14-05-02 17:14:42    643: before reading:
14-05-02 17:14:42    643:     ok
14-05-02 17:14:42    643:         $ send INPUT_OK
14-05-02 17:14:42    643:         > reading
lt-zbroker: zpipes_server.c:417: pipe_send_data: Assertion `self->reader' failed.
Aborted

Strangely, the pipe that was written to didn't have a reader. I'm thinking some wires got crossed here. The writer should have blocked on no reader, but it sent data.

Let me know if there is more repro data I can provide.

hintjens commented 10 years ago

Is this using the same pipe name for both writers, or different pipe names?

On Sat, May 3, 2014 at 12:19 AM, Ron Pedde notifications@github.com wrote:

This might be related to the other assert, as it's in the same pattern:

open writer on hosta open corresponding reader on hostb open new writer on hostb

write to new writer handle, reader asserts:

zpipes_server.c:417: pipe_send_data: Assertion `self->reader' failed. Aborted

Writer animation:

14-05-02 17:14:23 I: starting zpipes server using config in '/etc/zbroker.cfg' 14-05-02 17:14:23 I: joining cluster as DEC3D1B4C98A5003B10285003C1CD50F 14-05-02 17:14:23 N: starting zpipes_server service 14-05-02 17:14:23 N: binding zpipes service to 'ipc://@/zpipes/local' 14-05-02 17:14:33 I: ZPIPES server appeared at 64ECAC5F2DDC5D91320A6CD4AA3079A4 14-05-02 17:14:42 0: start: 14-05-02 17:14:42 0: OUTPUT 14-05-02 17:14:42 0: $ lookup or create pipe 14-05-02 17:14:42 0: $ open pipe writer 14-05-02 17:14:42 0: > before writing 14-05-02 17:14:42 0: before writing: 14-05-02 17:14:42 0: ok 14-05-02 17:14:42 0: $ send OUTPUT_OK 14-05-02 17:14:42 0: > writing 14-05-02 17:14:42 1: start: 14-05-02 17:14:42 1: OUTPUT 14-05-02 17:14:42 1: $ lookup or create pipe 14-05-02 17:14:42 1: $ open pipe writer 14-05-02 17:14:42 1: > before writing 14-05-02 17:14:42 1: before writing: 14-05-02 17:14:42 1: ok 14-05-02 17:14:42 1: $ send OUTPUT_OK 14-05-02 17:14:42 1: > writing 14-05-02 17:14:42 1: writing: 14-05-02 17:14:42 1: WRITE 14-05-02 17:14:42 1: $ process write request 14-05-02 17:14:42 1: > processing write 14-05-02 17:14:42 1: processing write: 14-05-02 17:14:42 1: have reader 14-05-02 17:14:42 1: $ pass data to reader 14-05-02 17:14:42 1: $ send WRITE_OK 14-05-02 17:14:42 1: > writing 14-05-02 17:14:42 1: writing: 14-05-02 17:14:42 1: CLOSE 14-05-02 17:14:42 1: $ close pipe writer 14-05-02 17:14:42 1: $ send CLOSE_OK 14-05-02 17:14:42 1: > start 14-05-02 17:14:52 0: writing: 14-05-02 17:14:52 0: expired 14-05-02 17:14:52 0: $ close pipe writer 14-05-02 17:14:52 0: $ terminate 14-05-02 17:14:52 1: start: 14-05-02 17:14:52 1: expired 14-05-02 17:14:52 1: $ terminate 14-05-02 17:14:52 I: ZPIPES server vanished from 64ECAC5F2DDC5D91320A6CD4AA3079A4

Reader animation:

14-05-02 17:14:33 I: starting zpipes server using config in 'zbroker.cfg' 14-05-02 17:14:33 I: joining cluster as 64ECAC5F2DDC5D91320A6CD4AA3079A4 14-05-02 17:14:33 N: starting zpipes_server service 14-05-02 17:14:33 N: binding zpipes service to 'ipc://@/zpipes/local' 14-05-02 17:14:33 I: ZPIPES server appeared at DEC3D1B4C98A5003B10285003C1CD50F 14-05-02 17:14:42 643: start: 14-05-02 17:14:42 643: INPUT 14-05-02 17:14:42 643: $ lookup or create pipe 14-05-02 17:14:42 643: $ open pipe reader 14-05-02 17:14:42 643: > before reading 14-05-02 17:14:42 643: before reading: 14-05-02 17:14:42 643: ok 14-05-02 17:14:42 643: $ send INPUT_OK 14-05-02 17:14:42 643: > reading lt-zbroker: zpipes_server.c:417: pipe_send_data: Assertion `self->reader' failed. Aborted

Strangely, the pipe that was written to didn't have a reader. I'm thinking some wires got crossed here. The writer should have blocked on no reader, but it sent data.

Let me know if there is more repro data I can provide.

— Reply to this email directly or view it on GitHubhttps://github.com/zeromq/zbroker/issues/60 .

hintjens commented 10 years ago

Whenever there's an assertion, can you try to get the backtrace? The animation is valuable as well, thanks.

rpedde commented 10 years ago

set break on that assert, but it doesn't assert in gdb. :/ I'll keep trying though.

rpedde commented 10 years ago

I think it's related to this.

  1. open write pipe named 2a982903-ed15-422b-87d5-6afc418e193a on writer host
  2. open reader pipe for 2a982903-ed15-422b-87d5-6afc418e193a on reader host
  3. open write pipe named 169e4cbd-62f3-4c30-afd7-8b00d82b0f6e on writer host
  4. write to write pipe 169e4cbd-62f3-4c30-afd7-8b00d82b0f6e

This should block. But sometimes it doesn't. It's racy, though, and I had to load my machine down to the point that smoke was coming out the vent fans, but this is what happens:

Writer logs:

zbroker service/0.0.1
Copyright (c) 2014 the Contributors
This Software is provided under the MPLv2 License on an "as is" basis,
without warranty of any kind, either expressed, implied, or statutory.

14-05-03 15:14:19 I: starting zpipes server using config in 'zbroker-1.cfg'
14-05-03 15:14:19 I: joining cluster as 89301E
14-05-03 15:14:19 N: starting zpipes_server service
14-05-03 15:14:19 N: binding zpipes service to 'ipc://@/zpipes/local'
14-05-03 15:14:19 I: ZPIPES server appeared at BE79E0
14-05-03 15:14:19 D:    309:2a982903-ed15-422b-87d: open local writer
14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast
14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast
14-05-03 15:14:19 D:    310:169e4cbd-62f3-4c30-afd: open local writer
14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast
14-05-03 15:14:19 D:    310:169e4cbd-62f3-4c30-afd: write 2 bytes
14-05-03 15:14:19 D:    310:169e4cbd-62f3-4c30-afd: close local writer
14-05-03 15:14:19 D:    309:2a982903-ed15-422b-87d: close local writer
interrupted
14-05-03 15:14:19 N: terminating zpipes_server service

From the writer perspective, it wrote the data, because there was a reader. But there was no reader open. (!?!?!)

Reader logs:

zbroker service/0.0.1
Copyright (c) 2014 the Contributors
This Software is provided under the MPLv2 License on an "as is" basis,
without warranty of any kind, either expressed, implied, or statutory.

14-05-03 15:14:19 I: starting zpipes server using config in 'zbroker-2.cfg'
14-05-03 15:14:19 I: joining cluster as BE79E0
14-05-03 15:14:19 N: starting zpipes_server service
14-05-03 15:14:19 N: binding zpipes service to 'ipc://@/zpipes/local2
14-05-03 15:14:19 I: ZPIPES server appeared at 89301E
14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=broadcast
14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast
14-05-03 15:14:19 D:    309:2a982903-ed15-422b-87d: open local reader
14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=broadcast
14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast
14-05-03 15:14:19 D: remote=89301E command=DATA pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast
14-05-03 15:14:19 D: discard 2 bytes, unroutable
14-05-03 15:14:19 D: remote=89301E command=DROP WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast
14-05-03 15:14:19 D: remote=89301E command=DROP WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast
14-05-03 15:14:19 D:    309:2a982903-ed15-422b-87d: close local reader
14-05-03 15:14:19 D:    309:2a982903-ed15-422b-87d: close local reader
interrupted
14-05-03 15:14:19 N: terminating zpipes_server service

I think this condition used to assert, but now it's discarding and continuing.

hintjens commented 10 years ago

:-) OK, I'll get onto it tomorrow morning. The logs should be enough, thanks!

On Sat, May 3, 2014 at 10:23 PM, Ron Pedde notifications@github.com wrote:

I think it's related to this.

  1. open write pipe named 2a982903-ed15-422b-87d5-6afc418e193a on writer host
  2. open reader pipe for 2a982903-ed15-422b-87d5-6afc418e193a on reader host
  3. open write pipe named 169e4cbd-62f3-4c30-afd7-8b00d82b0f6e on writer host
  4. write to write pipe 169e4cbd-62f3-4c30-afd7-8b00d82b0f6e

This should block. But sometimes it doesn't. It's racy, though, and I had to load my machine down to the point that smoke was coming out the vent fans, but this is what happens:

Writer logs:

zbroker service/0.0.1 Copyright (c) 2014 the Contributors This Software is provided under the MPLv2 License on an "as is" basis, without warranty of any kind, either expressed, implied, or statutory.

14-05-03 15:14:19 I: starting zpipes server using config in 'zbroker-1.cfg' 14-05-03 15:14:19 I: joining cluster as 89301E 14-05-03 15:14:19 N: starting zpipes_server service 14-05-03 15:14:19 N: binding zpipes service to 'ipc://@/zpipes/local' 14-05-03 15:14:19 I: ZPIPES server appeared at BE79E0 14-05-03 15:14:19 D: 309:2a982903-ed15-422b-87d: open local writer 14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast 14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast 14-05-03 15:14:19 D: 310:169e4cbd-62f3-4c30-afd: open local writer 14-05-03 15:14:19 D: remote=BE79E0 command=HAVE READER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast 14-05-03 15:14:19 D: 310:169e4cbd-62f3-4c30-afd: write 2 bytes 14-05-03 15:14:19 D: 310:169e4cbd-62f3-4c30-afd: close local writer 14-05-03 15:14:19 D: 309:2a982903-ed15-422b-87d: close local writer interrupted 14-05-03 15:14:19 N: terminating zpipes_server service

From the writer perspective, it wrote the data, because there was a reader. But there was no reader open. (!?!?!)

Reader logs:

zbroker service/0.0.1 Copyright (c) 2014 the Contributors This Software is provided under the MPLv2 License on an "as is" basis, without warranty of any kind, either expressed, implied, or statutory.

14-05-03 15:14:19 I: starting zpipes server using config in 'zbroker-2.cfg' 14-05-03 15:14:19 I: joining cluster as BE79E0 14-05-03 15:14:19 N: starting zpipes_server service 14-05-03 15:14:19 N: binding zpipes service to 'ipc://@/zpipes/local2 14-05-03 15:14:19 I: ZPIPES server appeared at 89301E 14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=broadcast 14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast 14-05-03 15:14:19 D: 309:2a982903-ed15-422b-87d: open local reader 14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=broadcast 14-05-03 15:14:19 D: remote=89301E command=HAVE WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast 14-05-03 15:14:19 D: remote=89301E command=DATA pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast 14-05-03 15:14:19 D: discard 2 bytes, unroutable 14-05-03 15:14:19 D: remote=89301E command=DROP WRITER pipe=169e4cbd-62f3-4c30-afd7-8b00d82b0f6e mode=unicast 14-05-03 15:14:19 D: remote=89301E command=DROP WRITER pipe=2a982903-ed15-422b-87d5-6afc418e193a mode=unicast 14-05-03 15:14:19 D: 309:2a982903-ed15-422b-87d: close local reader 14-05-03 15:14:19 D: 309:2a982903-ed15-422b-87d: close local reader interrupted 14-05-03 15:14:19 N: terminating zpipes_server service

— Reply to this email directly or view it on GitHubhttps://github.com/zeromq/zbroker/issues/60#issuecomment-42115431 .

hintjens commented 10 years ago

Found the cause, I think. Bad test on pipe reader, causing node to accept DATA when it didn't have a reader. Old code asserted (which was arguably the better response); new code discards the data (which is more robust since DATA can come late, after a reader has disappeared).