jnr / jnr-posix

Java Posix layer
Other
241 stars 91 forks source link

Error building `jnr-posix`on ppc64le platform #144

Closed sarveshtamba closed 4 years ago

sarveshtamba commented 4 years ago

Trying to build jnr-posix v3.0.44 and v3.0.54 (REPOSITORY="https://github.com/jnr/jnr-posix.git") on ppc64le platform, however facing the following errors:-

...
...
...

Running jnr.posix.LinuxPOSIXTest
Tests run: 2, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 0.526 sec <<< FAILURE! - in jnr.posix.LinuxPOSIXTest
testMessageHdrMultipleControl(jnr.posix.LinuxPOSIXTest)  Time elapsed: 0.043 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at jnr.posix.LinuxPOSIXTest.testMessageHdrMultipleControl(LinuxPOSIXTest.java:140)

ioprioThreadedTest(jnr.posix.LinuxPOSIXTest)  Time elapsed: 0.004 sec  <<< ERROR!
java.lang.IllegalStateException: ioprio_set is not implemented in jnr-posix
        at jnr.posix.util.DefaultPOSIXHandler.unimplementedError(DefaultPOSIXHandler.java:28)
        at jnr.posix.LinuxPOSIX.ioprio_set(LinuxPOSIX.java:288)
        at jnr.posix.LinuxPOSIXTest.ioprioThreadedTest(LinuxPOSIXTest.java:50)

...
...
...
Running jnr.posix.GroupTest
Tests run: 32, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 4.611 sec <<< FAILURE! - in jnr.posix.FileTest
accessTest(jnr.posix.FileTest)  Time elapsed: 0.009 sec  <<< FAILURE!
java.lang.AssertionError: access:  expected:<-1> but was:<0>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at jnr.posix.FileTest.accessTest(FileTest.java:505)

fcntlDupfdTest(jnr.posix.FileTest)  Time elapsed: 0.01 sec  <<< ERROR!
java.io.IOException: Stream Closed
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:255)
        at jnr.posix.FileTest.fcntlDupfdTest(FileTest.java:298)

fcntlDupfdWithArgTest(jnr.posix.FileTest)  Time elapsed: 0 sec  <<< ERROR!
java.io.IOException: Stream Closed
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:255)
        at jnr.posix.FileTest.fcntlDupfdWithArgTest(FileTest.java:320)

Results :

Failed tests:
  FileTest.accessTest:505 access:  expected:<-1> but was:<0>
  LinuxPOSIXTest.testMessageHdrMultipleControl:140 null
Tests in error:
  FileTest.fcntlDupfdTest:298 » IO Stream Closed
  FileTest.fcntlDupfdWithArgTest:320 » IO Stream Closed
  LinuxPOSIXTest.ioprioThreadedTest:50 » IllegalState ioprio_set is not implemen...

Tests run: 95, Failures: 2, Errors: 3, Skipped: 1

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  12.875 s
[INFO] Finished at: 2020-04-17T11:17:17Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project jnr-posix: There are test failures.
[ERROR]
[ERROR] Please refer to /root/jnr-posix-3.0.54/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Any inputs will be highly appreciated.

sarveshtamba commented 4 years ago

@headius any inputs on this one?

nirvdrum commented 4 years ago

It looks to me like you'll need to update LinuxPOSIX's section on syscalls to handle the PPC64LE ABI in order to handle the ioprio_set case. As for the file tests, it might be that jnr-constants has different values for Linux than PPC64LE uses. I ran into this once with SPARCv9. It unfortunately never was merged due to staleness, but you could use the PR as a template for introducing a new platform to jnr-constants (if needed).

headius commented 4 years ago

As with https://github.com/jnr/jnr-ffi/issues/200 we do not have access to a ppc64le test environment, so I've applied to get free access through a university.

Do look into helping us generate updated constants as @nirvdrum mentioned.

headius commented 4 years ago

I have attempted to run tests on a Power8 environment. My results are a little different from yours:

Failed tests: 
  LinuxPOSIXTest.testMessageHdrMultipleControl:140 null
Tests in error: 
  FileTest.fcntlDupfdTest:298 » IO Stream Closed
  FileTest.fcntlDupfdWithArgTest:320 » IO Stream Closed
  LinuxPOSIXTest.ioprioThreadedTest:50 » IllegalState ioprio_set is not implemen...

I suspect the access test failure (edit: the one in your results that's not in mine) may be due to an OS-level difference between our environments so I'm not going to dig into that one at the moment (perhaps you can do so).

headius commented 4 years ago

Ok, so I attempted to regenerate constants, and while there are a few differences none of them appear to be related to these failures.

So I have proceeded to create #145 to track fixes.

The ioprio failure is fixed there already.

The testMessageHdrMultipleControl test is failing because the receiving side only receives one control message, not two. This could be an environmental thing, but I am really unfamiliar with the behavior of sendmsg and recvmsg at this level.

The remaining two fcntl issues indicate that either the file descriptor is not getting dup'ed properly (resulting fd appears to be closed) or the resulting dup'ed fd is not getting into a FileInputStream successfully.

headius commented 4 years ago

Yeah the fcntl tests are failing to dup; the return value is -1. Hmmm.

headius commented 4 years ago

It appears that the behavior of F_DUPFD is somewhat undefined when passing no third argument, and that's the cause of the fcntl failures.

These tests pass on Linux and Darwin, but a similar piece of C code revealed some surprising behavior differences:

All documentation I can find online indicates that F_DUPFD will use that third argument, but most docs don't say explicitly that it's required nor what happens if it is not passed.

Then I found this doc for "fcntl64" that makes it more explicit: http://www.cbs.dtu.dk/cgi-bin/nph-runsafe?man=fcntl64

  fcntl() can take an optional third argument.  Whether or not this argu-
  ment  is  required is determined by cmd.  The required argument type is
  indicated in parentheses after  each  cmd  name  (in  most  cases,  the
  required  type  is  long,  and  we identify the argument using the name
  arg), or void is specified if the argument is not required.

... Duplicating a file descriptor F_DUPFD (long) Find the lowest numbered available file descriptor greater than or equal to arg and make it be a copy of fd. This is different from dup2(2), which uses exactly the descriptor specified.

         On success, the new descriptor is returned.

         See dup(2) for further details.

  F_DUPFD_CLOEXEC (long; since Linux 2.6.24)
         As for F_DUPFD, but additionally set the close-on-exec flag  for
         the  duplicate  descriptor.  Specifying this flag permits a pro-
         gram to avoid an additional fcntl() F_SETFD operation to set the
         FD_CLOEXEC flag.  For an explanation of why this flag is useful,
         see the description of O_CLOEXEC in open(2).

I think the smartest thing for these tests would be to modify them to properly use the three-arg forms of fcntl and fix the deprecated form to also do the right thing.

headius commented 4 years ago

With additional patches in #145 there's only the recvmsg failure remaining. I've at least improved the error output so we can see we're only getting one control message back.

testMessageHdrMultipleControl(jnr.posix.LinuxPOSIXTest)  Time elapsed: 0.073 sec  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.failNotEquals(Assert.java:743)
    at org.junit.Assert.assertEquals(Assert.java:118)
    at org.junit.Assert.assertEquals(Assert.java:555)
    at org.junit.Assert.assertEquals(Assert.java:542)
    at jnr.posix.LinuxPOSIXTest.testMessageHdrMultipleControl(LinuxPOSIXTest.java:140)
headius commented 4 years ago

Aha, a bit of spelunking and I found that this last failing test was added by @rm5248 in #140. It was subsequently quarantined to run on Linux only (it was not passing on Darwin, but I don't recall how it failed).

Perhaps @rm5248 has some thoughts on why this would fail on Linux PPC64LE? And perhaps you can explain to me how this test manages to receive two control message headers in the first place, because I'm a bit confused about that. 😀

headius commented 4 years ago

Ok a couple mysteries solved:

  1. The additional header is received because the sending socket has been set to SO_PASSCRED. This causes it to include the caller's pid, uid, and gid as a second control message.
  2. Darwin is the other primary platform we test on, and it does not support the SO_PASSCRED. This is why I had to isolate this test to run only on Linux for #140.

But as far as I can tell, this test should work the same way on Linux PPC64LE, so the cause of our failure is still unknown.

headius commented 4 years ago

Correction: SO_PASSCRED gets set on the read side.

headius commented 4 years ago

And the final mystery is solved. Going back to those few jnr-constants changes, this is among them:

@@ -37,8 +37,8 @@ SO_ATTACH_FILTER(0x1aL),
 SO_BINDTODEVICE(0x19L),
 SO_DETACH_FILTER(0x1bL),
 SO_NO_CHECK(0xbL),
-SO_PASSCRED(0x10L),
-SO_PEERCRED(0x11L),
+SO_PASSCRED(0x14L),
+SO_PEERCRED(0x15L),
 SO_PEERNAME(0x1cL),
 SO_PRIORITY(0xcL),
 SO_SECURITY_AUTHENTICATION(0x16L),

The problem here is that these values (and possibly the others that changed) differ only on PPC, but jnr-constants does not currently have the ability to separate constants by architecture.

Regenerating the constants and using the updated jnr-constants in jnr-posix allows this final test to pass.

headius commented 4 years ago

For reference, the relevant section of the asm-generic/socket.h headers on PPC Linux:

...
#define SO_REUSEPORT    15
#ifndef SO_PASSCRED /* powerpc only differs in these */
#define SO_PASSCRED     16
#define SO_PEERCRED     17
...

And the non-generic asm/socket.h:

#define SO_RCVLOWAT     16
#define SO_SNDLOWAT     17
#define SO_RCVTIMEO_OLD 18
#define SO_SNDTIMEO_OLD 19
#define SO_PASSCRED     20
#define SO_PEERCRED     21
headius commented 4 years ago

Well it's been an adventure, but we have a green build on PPC64LE. I have merged #145.

The remaining issue with SO_PASSCRED will require fixing jnr/jnr-constants#67 and jnr/jnr-constants#68.

@sarveshtamba Please verify in your environment! Once you can confirm both jnr-ffi and jnr-posix pass tests for you I'll look at spinning some releases.

headius commented 4 years ago

Since I was able to get a green build myself on Power8 Linux, I'm going ahead with the release of 3.0.55.

sarveshtamba commented 4 years ago

@headius thanks for looking into this quickly. I tried building v3.0.55 and the master branches, however I still see errors as below:-

readlinkPointerTest(jnr.posix.FileTest)  Time elapsed: 0.013 sec  <<< FAILURE!
org.junit.ComparisonFailure: expected:</tmp/jnr-p[?six-r??dl?nk-t?]st470986432865872718...> but was:</tmp/jnr-p[?six-r??dl?nk-t?]st470986432865872718...>
        at org.junit.Assert.assertEquals(Assert.java:115)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at jnr.posix.FileTest.readlinkPointerTest(FileTest.java:580)

accessTest(jnr.posix.FileTest)  Time elapsed: 0.003 sec  <<< FAILURE!
java.lang.AssertionError: access:  expected:<-1> but was:<0>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at jnr.posix.FileTest.accessTest(FileTest.java:508)

Results :

Failed tests:
  FileTest.accessTest:508 access:  expected:<-1> but was:<0>
  FileTest.readlinkPointerTest:580 expected:</tmp/jnr-p[?six-r??dl?nk-t?]st470986432865872718...> but was:</tmp/jnr-p[?six-r??dl?nk-t?]st470986432865872718...>

Tests run: 93, Failures: 2, Errors: 0, Skipped: 1

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  26.117 s
[INFO] Finished at: 2020-04-22T12:23:46Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project jnr-posix: There are test failures.
[ERROR]
[ERROR] Please refer to /root/jnr-posix-master/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
headius commented 4 years ago

@sarveshtamba I have opened #146 for these additional failures. I did not see them on Fedora 29 on Power8 so I will need your help to investigate.