zeromq / jeromq

JeroMQ is a pure Java implementation of the ZeroMQ messaging library, offering high-performance asynchronous messaging for distributed or concurrent applications.
https://zeromq.org
Mozilla Public License 2.0
2.36k stars 482 forks source link

URI bind/connect parsing error in zmq.io.net.tcp.TcpAddress::resolve(..) #874

Open RalphSteinhagen opened 3 years ago

RalphSteinhagen commented 3 years ago

We are implementing a Majordomo broker derivative that uses in addition to the ROUTER also XPUB and SUB socket types for communicating with external/internal clients and workers.

In order to distinguish the endpoints, we bind these sockets internally to:

In order to keep this symmetric for external clients/workers and to stick to the RFC 3986 URI convention (ie. scheme:[//authority]path[?query][#fragment]), the same sockets are also individually bound to external IP:port addresses. I'd like to simply replace the 'broker' authority part with a specific 'IP:port' address (not necessarily all the same for each socket), for example 'tcp://localhost:5050/router', which presently fails in TcpAddress::resolve(..) with

java.lang.NumberFormatException: For input string: "5050/router"

at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at jeromq@0.5.2/zmq.io.net.tcp.TcpAddress.resolve(TcpAddress.java:117)
[..]

since the parser assumes that there is only one number in the portStr after the addStr and colon: port = Integer.parseInt(portStr);

Binding several different sockets (but with different paths) on the same port aside, it would be nice if the addressing scheme could be fixed, honour the RFC URI convention, and either a) ignore the additional path component, or b) use the additional path information as an additional specifier similar to the 'inproc:////' syntax.

Maybe, at the same time, it would be worthwhile to reduce a lot of this custom/redundant code with the existing java.net.URI class implementation that handles a lot of the parsing and syntax checking.

A feature request that is related, but goes beyond this issue: would it be conceivable to use the 'path' argument as a differentiating feature to allow binding multiple sockets to the same TCP port ie. same port but different paths as for the 'inproc://' implementation? This would keep the implementation between both low-level protocols a bit more symmetric and cleaner.

fredoboulo commented 3 years ago

Hi, Jeromq is aligned with Zeromq in the transport address definition, for example in tcp or inproc.

The project is based on C4, you are welcome to submit a PR. Please be mindful of the backward-compatibility (and testing, but that's more of an individual inclination).

RalphSteinhagen commented 3 years ago

@fredoboulo thanks for the links. I started to extend this (which is/should be technically simple) but encountered some stumbling blocks, maybe you could advise how to proceed:

The usage of the asterisk '*' is different from the official URI RFX definition but manageable/ok and a simple replace. However:

Would be willing to dive a bit more into the code, modify it where necessary but am a bit worried that'll it be for nothing if these proposed changes (ie. endpoints following URI semantics, Java 11) which are close to the core are perceived to be too much...

Your input would thus be much appreciated.

fredoboulo commented 3 years ago

The library aims to support Android usage, which means almost-Java-8. It's painful to maintain (and only my individual perception).

Can you point which unit tests in Jeromq are omitting the scheme/protocol?

RalphSteinhagen commented 3 years ago

Pretty much all in TcpAddressTests except for testBad(), testGoodIP46Google(), and other test that implicitly use 'tcp'.

I thought of generalising TcpAddress::resolve(..) using URI (which would take most of the parsing/syntax checking) to:

@Override
public InetSocketAddress resolve(String name, boolean ipv6, boolean local)
{
    if (name == null) {
        throw new IllegalArgumentException("declared TCP endpoint is null");
    }
    final URI uri = URI.create(name.replace("*", ipv6 ? "::" : "0.0.0.0"));
    final String newHost = uri.getHost();
    final int newPort = uri.getPort();
    final String path = uri.getPath();
    // [..]
}

There are also some parts in zmq.io.net.Address that drop the protocol implicitly but these are minor and can be fixed transparently.

It's of course possible to change all the unit-tests accordingly, but this may (or not) surprise some users that relied on this implicit behaviour. The first PR would have simply ignored the path argument -- as it should with the present definition.

Then -- as a second step (another PR) -- I would have propagated the path as (an optional) specifier argument similar to how it is done for inproc:// and also to allow a more economic use of TCP ports.

The library aims to support Android usage, which means almost-Java-8.

Well, then I should be probably looking for an Android-compatible JDK8 if I want to get anything merged. :-)

N.B. my IDE refuses to compile/run with the deprecated jdk.net.Sockets.setOption(socket, option, value); due to (probably) modularisation issues and/or it being an intellij issue.