zeromq / libzmq

ZeroMQ core engine in C++, implements ZMTP/3.1
https://www.zeromq.org
Mozilla Public License 2.0
9.64k stars 2.35k forks source link

Tests and Protocol Support for TIPC partially broken #1968

Closed hitstergtd closed 8 years ago

hitstergtd commented 8 years ago

Unless there is something missing in my TIPC setup, it seems that all the TIPC tests are broken and they seem to block/hang indefinitely in the Polling code, based on simple runs of strace. The hang happens on the recv path, whilst the send seems to work fine.

How to reproduce the issue?:

All the TIPC tests hang on the recv code; send seems to work fine

hitstergtd commented 8 years ago

Not sure if this is due to any recent change in options, etc. I have yet to test old ZeroMQ releases.

hitstergtd commented 8 years ago

OK, zeromq4-1 is broken too for TIPC tests. Same issue, send works, recv just hangs.

bluca commented 8 years ago

Did it work at any point?

hitstergtd commented 8 years ago

Good question. I would like to think so. I attempted to trace it back to the original commit, and I still cannot get the tests to work. The only rapid conclusion I can come to is that the TIPC protocol semantics might have changed in TIPC 2.0, at the Linux kernel level, and our stream engine isn't able to deal with it. But that's just speculation at this point.

Perhaps @Hugne can help?

Hugne commented 8 years ago

TIPC maintainers notified. http://thread.gmane.org/gmane.network.tipc.general/10120

Ill try to take a look at it next weekend if it's not fixed by then.

//E

Hugne commented 8 years ago

I took a quick look at this today, i can confirm that the issue is in ZeroMQ and not kernel/tipc module. And that it seems to be something on the receiving end that's broken. For example test_pair_tipc is hanging on poll() when bounce() is called (testutil.hpp) And the TIPC tests are working in the initial commits.

Hugne commented 8 years ago

I found the problem, the explanation will probably sound like gibberish if you've never used TIPC before, so i'll start with a tl;dr;givemethefix. Before running the tests, assign a TIPC address to the test system. tipc node set address 1.1.1 (file a bug on your dist if the iproute2 package doesn't have this command)

The TIPC protocol layer in ZeroMQ defaults to a lookup domain of 1.0.0 (tipc_address.cpp resolve function) 72 /* Since we can't specify lookup domain when connecting 73 * (and we're not sure that we want it to be configurable) 74 * Change from 'closest first' approach, to search entire zone */ 75 address.addr.name.domain = tipc_addr (1, 0, 0);

We could change it to 0,0,0, which should make the tests pass directly on an unconfigured system aswell, but that would mean that all name lookups are done using 'closest first'..

bluca commented 8 years ago

Thanks for chasing this up.

(and we're not sure that we want it to be configurable)

I'll ask the naive question, knowing nothing about TIPC: why not make it configurable?

Hugne commented 8 years ago

hm, i think the original idea i had was to try and hide details that users rarely need to worry about, but this one seem to have backfired.. 😑 Do you have any suggestion on how to configure it?

Lookup domain isa three-tuple value, closely tied to an address, but only relevant when connecting. special zmq sockopt? or encode in the address?

bluca commented 8 years ago

So if I understand the issue correctly, it's just a configuration problem with the tests, right? If correctly configured it all works fine?

Then, the safest option would be a new zmq sockopt. So that it can be backward compatible, but we can also fix the tests. If the parameter is uniquely and specifically tied to an address (so that it wouldn't make sense for 2 different sockets to have the same value, kind of like you won't bind 2 sockets to the same TCP address:port tuple) then I think the correct solution would be to encode it in the endpoint, but we would have to be absolutely sure it's backward compatible, ie: not adding it needs to have the same behaviour as before.

hitstergtd commented 8 years ago

Eric, thanks for looking into this; I thought it might be a configuration issue. The setsockopt option would be the simplest way to go about this and explain to users. Specifying it in the endpoint address would be good

Hugne commented 8 years ago

what do you think about adding an optional domain suffix as"@1.0.0"?

zmq_connect(sc, "tipc://{5560,0}"); becomes: zmq_connect(sc, "tipc://{5560,0}@0.0.0");

it kind of makes sense, connect to tipc type 5560 instance 0 in domain 0.0.0.

bluca commented 8 years ago

Looks good to me, thanks! Sorry to be insistent, but please do make sure to keep the current behaviour (domain used when no @ is passed) the same :-) @somdoron any preference?

hitstergtd commented 8 years ago

Eric, That does look good. As @bluca said, so long as we default to the current behaviour when nothing is specified it will be a win-win, and then we can just amend the tests to work properly on an unconfigured host.

Hugne commented 8 years ago

1982 should solve this.

bluca commented 8 years ago

Thanks!

hitstergtd commented 8 years ago

@Hugne, Just ran numerous tests and getting to know more about TIPC in ZMQ. Works well. Thanks again.

Although, I have to say that the official TIPC documentation on the SF site leaves much to be desired, and it would be good if they moved all their development to GitHub, IMO.

Hugne commented 8 years ago

@hitstergtd Actually all development have moved away from SF. Tipcutils have become "tipc" in th iproute2 package, and they only use SF for the mailing list and hosting the programmers guide and protocol spec afaik.

But i totally agree that the documentation leaves much to be desired..

hitstergtd commented 8 years ago

@Hugne, thanks. OK, that makes things clear.

For sake of posterity, iproute2 is at: