jnr / jnr-netdb

Network services database access for java
Other
12 stars 4 forks source link

Fails on alpine linux #4

Open Paxa opened 7 years ago

Paxa commented 7 years ago

Alpine linux don't have getprotobyname_r

WARNING: Failed to load native protocols db
java.lang.UnsatisfiedLinkError: unknown
    at jnr.ffi.provider.jffi.AsmRuntime.newUnsatisifiedLinkError(AsmRuntime.java:40)
    at jnr.netdb.NativeProtocolsDB$LinuxLibProto$jnr$ffi$2.getprotobyname_r(Unknown Source)
    at jnr.netdb.NativeProtocolsDB$LinuxNativeProtocolsDB.getProtocolByName(NativeProtocolsDB.java:176)
    at jnr.netdb.NativeProtocolsDB.load(NativeProtocolsDB.java:80)
    at jnr.netdb.NativeProtocolsDB.access$000(NativeProtocolsDB.java:40)
    at jnr.netdb.NativeProtocolsDB$SingletonHolder.<clinit>(NativeProtocolsDB.java:47)
    at jnr.netdb.NativeProtocolsDB.getInstance(NativeProtocolsDB.java:43)
    at jnr.netdb.Protocol$ProtocolDBSingletonHolder.load(Protocol.java:107)
    at jnr.netdb.Protocol$ProtocolDBSingletonHolder.<clinit>(Protocol.java:103)
    at jnr.netdb.Protocol.getProtocolDB(Protocol.java:96)
    at jnr.netdb.Protocol.getProtocolByNumber(Protocol.java:59)
    at org.jruby.ext.socket.Addrinfo.<init>(Addrinfo.java:549)
    at org.jruby.ext.socket.SocketUtils$2.addrinfo(SocketUtils.java:273)

Related https://github.com/jruby/jruby/issues/4408

headius commented 7 years ago

There must be something equivalent? I'm hoping we just need to try a different symbol name and hopefully detect that we're on musl vs glibc or similar...

headius commented 7 years ago

Ahh, I see what's happening here. The _r versions of these netdb functions are provided by GNU libc as reentrant versions of the regular names (according to the header on my glibc-based Fedora machine).

I was able to find man pages for getprotobyname_r on Arch, so this isn't specifically an Arch problem...unless they've recently started defaulting to musl libc.

A search for musl getprotobyname_r brings up many bug reports of C libraries and applications failing to build against musl, so this is a fairly widespread problem.

headius commented 7 years ago

So there's two solutions I can see here.

  1. Detect musl somehow and use the non-reentrant versions. Here be dragons, because the structures returned by those calls are on many platforms statically allocated, making them unsuitable for multi-threaded calls. This path would need to at least do some hard locking around these calls, which still wouldn't be 100% safe.
  2. Attempt to look up a protocol name during the library-booting portion of jnr-netdb setup, so that if the library or call fails we revert to one of the built-in network service database lists. They won't exactly reflect the host system, but do most systems have a lot of variability in their services lists?

There may also be a recommended alternative to these non-reentrant versions on musl. I have not done that research yet.

headius commented 7 years ago

Hmm, here's the weird thing...jnr-netdb currently does do a call to getProtoByName and getProtoByNumber during the library initialization, presumably for this precise reason. So perhaps the version you tested did not have an updated jnr-netdb? I'm getting Arch set up so I can try it here.

headius commented 7 years ago

Ok, finally managed to get an Arch VM set up and I can confirm jnr-netdb is a little messed up there. A bunch of tests fail. They may all be the same root cause, but the errors did not all seem related. Any help you can provide here would be a big help. Check out jnr-netdb, have maven and a Java 8 JDK installed, and run "mvn install". You'll see the errors.

headius commented 7 years ago

Ok, so here's what I've learned.

  1. Several tests fail because they expect an "ip" protocol to be defined on the host system. Arch does not have an entry for "ip". I modified those tests to use a more standard "ipv4" protocol and they now pass.
  2. The remaining failures appear to be differences in how aliased services are reported by the related APIs. The tests expect "comsat" and "biff" to be aliases for service 512. On Arch, getservicebyname_r does not appear to return the aliases, even though they're both in /etc/services.
  3. Current (and recent) Arch appear to use glibc by default, which as we've learned does have the _r versions of these APIs.

So again the issue is not Arch (for purposes of this bug). I'll dig around and see if there's a workaround for the missing _r functions on musl.

Paxa commented 7 years ago

Actually it print stacktrace but still worked, I had bug in application. Currently use docker image with glibc and it works great ( https://hub.docker.com/r/anapsix/alpine-java/ )

headius commented 7 years ago

Oh! So it doesn't fail completely but logs that exception? That may simply be a missed printStackTrace in the failover logic. I'll check on that.

Incidentally, if you can try to mvn test jnr-netdb on your system and report the results, I'd appreciate it. You should have four failures in some of the "services" tests due to the alias difference I mention above.

headius commented 7 years ago

Ok, so all I could see logging that exception is a log message at level "warning". Could you or your app be setting the JVM's log level to warning or lower?