zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.63k stars 2.35k forks source link

UDP engine crashes and exhibits weird behaviour #2009

Open hitstergtd opened 8 years ago

hitstergtd commented 8 years ago

I came across these as part of adding throughput and latency tests. I maybe doing something messed, which is leading to these results, although I am fairly certain it's nothing unreasonable. :)

About the code: The linked gist is basically a variant of the inproc_thr benchmark:

What works:

What does NOT work:

Code to reproduce this is available at: https://gist.github.com/hitstergtd/68503600e353adb3155504982df54682.

Not sure if these existed since day 1 or progressively over the last few months. I haven't tried the above scenarios on Windows, only on Linux 4.4.0-22 kernel (Ubuntu latest), so not sure if they're somehow quite poller dependent or not.

somdoron commented 8 years ago

Regarding the message size, UDP is currently limited to 8191 bytes (including the topic). I will try to figure out the rest. On May 24, 2016 00:40, "Hitster GTD" notifications@github.com wrote:

I came across these as part of adding throughput and latency tests. I maybe doing something messed, which is leading to these results, although I am fairly certain it's nothing unreasonable. :)

About the code: The linked gist is basically a variant of the inproc_thr benchmark:

  • It is modified to use RADIO/DISH sockets to send and receive messages over a single topic, aptly called thr_test.
  • Use of a separate sender thread was removed from this example. It basically sends N messages on the topic/group and then picks them up later.
  • The parameters are exactly like inproc_thr, i.e. message-size and message-count.

What works:

  • message-count <= 1000 AND message-size >= 0 AND message-size <= 8183

What does NOT work:

  • hangs indefinitely:
    • message-count > 1000 AND message-size > 0 (1 in 10 chance of a core dump)
    • message-count = 4500 AND message-size = 999 (sometimes, or dumps core as per below)
  • always dumps core:
    • message-count >= 1 AND message-size = 50000
  • dumps core (randomly) / double-free corruption message:
    • message-count = 1 AND message-size = 16413 (try running this repeatedly)
    • message-count = 4499 AND message-size = 999
  • message of incorrect size received:
    • message-count = 1000 AND message-size = 8184
    • message-count = 1000 AND message-size = 16413 (but one message dumps core!)

Code to reproduce this is available at: https://gist.github.com/hitstergtd/68503600e353adb3155504982df54682.

Not sure if these existed since day 1 or progressively over the last few months. I haven't tried the above scenarios on Windows, only on Linux 4.4.0-22 kernel (Ubuntu latest), so not sure if they're somehow quite poller dependent or not.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/zeromq/libzmq/issues/2009

somdoron commented 8 years ago

Becuase you srnd all and then receive message count larger than high watermark of subscriber will cause messages to drop, as UDP is unreliable, watermark is reached, internal buffers get full and new messages will get drop.

somdoron commented 8 years ago

Can you attach the stack trace of the assert?

hitstergtd commented 8 years ago

@somdoron

I will send the stack traces as soon as possible.

hitstergtd commented 8 years ago

Also - for UDP - sending/receiving message where (topic-length + message-size) is greater than 8191 bytes should throw an error at the API level, if it's not supported; unless I am missing something!

somdoron commented 8 years ago

@hitstergtd I will take a look next week, regarding message size, it is a simple code need fixing. at the time is was less important.

Regarding the crashing, do you a stack trace on where it is crashing?

hitstergtd commented 8 years ago

@somdoron No problem - I just thought I would report it so that's it's hopefully fixed for the 4.2 release, as I see that being one of the important features. I also wanted to see throughout and latency numbers for UDP transport to see if it fares any better than the TCP stream engine.

I will generate a stack trace in next couple of days and put it up here. Do you need it for all crash scenarios or just one of them?

somdoron commented 8 years ago

we can start with one of them

StephanOpfer commented 5 years ago

I still get a core-dump in case of messages larger than the aforementioned ~16413 Bytes. Is help appreciated? I can provide minimal working examples that produce this issue.

somdoron commented 4 years ago

hey @StephanOpfer, do you have the backtrace of the core-dump?