hypfvieh / dbus-java

Improved version of java DBus library provided by freedesktop.org (https://dbus.freedesktop.org/doc/dbus-java/)
https://hypfvieh.github.io/dbus-java/
MIT License
187 stars 73 forks source link

FileDescriptors don't actually get sent #81

Closed rm5248 closed 4 years ago

rm5248 commented 5 years ago

This is marked as fixed in #42, however after some testing I have found that sending FileDescriptors does not actually work correctly.

I have some sample code here that demonstrates the issue. What I am trying to do is to create a pipe in a client application and send it to a different application in order to write to it. I create the pipe via JNA, and then attempt to convert it to a java.io.FileDescriptor in order to be sent. When I do that, dbus-monitor has the following output:

method call time=1573231344.276953 sender=:1.1897 -> destination=test.filedescriptor serial=3 path=/; interface=fd_rx.FDInterface; member=setFileDescriptor
   file descriptor
method return time=1573231344.298849 sender=:1.1895 -> destination=:1.1897 serial=5 reply_serial=3

On the receiving side, the file descriptor is parsed and is believed to be valid, however when we attempt to write to it we get an IOException:

[2019-11-08 11:42:24] DEBUG fd_rx.FDRX - Got the file descriptor.  Valid? true
[2019-11-08 11:42:24] ERROR fd_rx.FDRX - IOException: 
java.io.IOException: Bad file descriptor
    at java.io.FileOutputStream.write(Native Method) ~[?:1.8.0_171]
    at java.io.FileOutputStream.write(FileOutputStream.java:290) ~[?:1.8.0_171]
    at java.io.DataOutputStream.writeChars(DataOutputStream.java:297) ~[?:1.8.0_171]
    at fd_rx.FDRX.setFileDescriptor(FDRX.java:31) [classes/:?]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
    at org.freedesktop.dbus.connections.AbstractConnection$2.run(AbstractConnection.java:766) [dbus-java-3.2.0.jar:3.2.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

I've also created a C++ application that uses dbus-cxx that acts the same as the Java application where it tries to send a FileDescriptor to the other process. This fails with an error, and the following shows up in dbus-monitor:

method call time=1573231622.504991 sender=:1.1900 -> destination=test.filedescriptor serial=2 path=/; interface=fd_rx.FDInterface; member=setFiledescriptor
   file descriptor
         inode: 42492471
         type: fifo
error time=1573231622.505026 sender=org.freedesktop.DBus -> destination=:1.1900 error_name=org.freedesktop.DBus.Error.NotSupported reply_serial=2
   string "Tried to send message with Unix file descriptorsto a client that doesn't support that."

It seems that the root of this problem is when a java.io.FileDescriptor is sent, an integer is simply marshaled onto the message, which is not the correct way to send a filedescriptor over a socket.

According to the DBus spec, the client must send a "NEGOTIATE_UNIX_FD Command" to allow FD passing. From my reading of the specification as well, it seems that after the NEGOTIATE_UNIX_FD has been sent, when you are actually sending the FD this needs to be set in the message header, and the FDs are sent as some kind of array(the spec indicates that they are sent out-of-band). The value of the FD set in the message is the index into this array.

hypfvieh commented 5 years ago

file descriptors are really hard ...

The current implementation just sends the number of the file descriptor so the calling client can create a new FileDescriptor object using the given number to access the same file.

Adding 'NEGOTIATE_UNIX_FD' and 'AGREE_UNIX_FD' is not a problem. My problem is to understand what "index into an out-of-band array of file descriptors" is supposed to mean (why an array of file descriptors?). The documentation is not very specific on how to deal with file descriptors.

Maybe my knowledge of the english language is not good enough. Maybe the documentation is just crap at that point.

If you can help provide patches, hints or anything useful, please let me know

rm5248 commented 5 years ago

The problem with sending just the number is that there's no way to take that number and turn it back into a file, since the filedescriptors are unique per-process.

I can take a look at this later this week, but some things that I have found: C#/Mono code that adds file descriptors: https://github.com/mono/dbus-sharp/pull/62 It looks like jnr-unixsocket doesn't support this at the moment, as what needs to happen is that sendmsg needs to be used instead of sendto(which is what is being used at the moment).

See also this StackOverflow post on doing this in C: https://stackoverflow.com/questions/28003921/sending-file-descriptor-by-linux-socket

rm5248 commented 5 years ago

After some speluking through the libdbus code, I think that I have found the method which actually does the writing out. So it seems that when the specification says

Unsigned 32-bit integer representing an index into an out-of-band array of file descriptors, transferred via some platform-specific mechanism

It is talking about setting certain fields in a struct cmsghdr, which is part of struct msghdr. This explains why the actual number of the filedescriptor needs to be an index into this array, and looking at dbus_message_iter_append_basic we see that the filedescriptors are basically copied over to their own separate array which is then sent out as part of the sendmsg call previously seen.

rm5248 commented 4 years ago

Question: is it required to continue to use jnr-unixsocket? I came across this other project just now that already has support for FileDescriptors over a Unix socket. This would swap out some dependencies, but I'm not sure if that would be acceptable or not.

hypfvieh commented 4 years ago

The reason for using jnr-unixsocket is that they do not require any native-compiled wrapper library.

As far as I can see junixsocket uses a wrapper as well. This is like the old libmatthew which also requird native library to be available.

Because linux (and therefore DBus) is running on many platforms with many different architectures it was always a pain providing native libraries for these platforms. Different platform versions (e.g. ARMv6/v7) also were annoying. In there cases the developer of the application has to figure out how to compile the native library and how to use it in combination with dbus-java.

I think this is inconvienent, so that's the main reason why using jnr-unixsockets and that's why I don't want to switch back to another library doing the same stuff I want to avoid.

But maybe we can provide a patch to the jnr-unixsocket maintainer to support the changes we need. I did this a while ago (litte patch to support the PASS_CRED feature) when implementing jnr-unixsocket in dbus-java.

I'll try to take a look into the implementation of this stuff in junixsocket next week, maybe I can write that patch.

Alternativly we could change dbus-java to support different 'transport' libraries, so if you need Filedescriptors (rare case in my opinion), you have to add the dbus-java-junixsocket transport, otherwise you can use the dbus-java-jnr-unixsocket transport ... But lets wait until we are really sure that this kind of feature is really required.

rm5248 commented 4 years ago

For using JNR, I've started on some work to see if it is possible. This is difficult because there's not much documentation on how to use JNR, JNA is much more prevalent. Branch is here: https://github.com/rm5248/jnr-unixsocket/tree/rm-testing

The problem that I have just run into when attempting to write a test for this is that there are certain macros that must be used in order for the cmsghdr to be set properly. According to the man page though:

Ancillary data is a sequence of struct cmsghdr structures with appended data. This sequence should only be accessed using the macros described in this manual page and never directly.

If that is true, then it seems likely that there's no way to do this from pure Java and we may need to make some JNI shim library to handle it.

hypfvieh commented 4 years ago

Maybe we get the job done, if we include/use jnr-posix. They already added the missing sendmsg/recvmsg and also some of the macros you mentioned.

hypfvieh commented 4 years ago

I pushed my changes to filedescriptors branch. I did not implement the stuff required for sendmsg/recvmsg, but already included jnr-posix as dependency.

I tried to add the required handshake-stuff so DBus knows that file descriptor passing is possible (NEGOTIATE_UNIX_FD).

rm5248 commented 4 years ago

I have at least a partial implementation at the moment. I think the basic code is correct, but it's not talking with the bus at the moment. My branch is here: https://github.com/rm5248/dbus-java/tree/filedescriptor-passing

The problem that I have at the moment is that the iovec is null(check FDMessageWriter):

19:22:32.236 [DBus Sender Thread-1] TRACE org.freedesktop.dbus.FDMessageWriter - msghdr to send: msghdr {
  msg_name=null,
  msg_namelen=0,
  msg_iov=[
    iovec {
      iov_base=null,
      iov_len=128,
    }
  ],
  msg_control=[
  ],
  msg_controllen=0
  msg_iovlen=1,
  msg_flags=0,
}

Since no data is ever written, we can never talk with the bus(the Hello message doesn't go out). I've tracked it down to what JNR-posix is doing with Pointer.wrap with a ByteBuffer. Just before that there's this line:

logger.debug( "pointer is {}", 
                Pointer.wrap( jnr.ffi.Runtime.getSystemRuntime(), dataSend ) );

Which is giving me a null pointer:

19:22:32.234 [DBus Sender Thread-1] DEBUG org.freedesktop.dbus.FDMessageWriter - pointer is jnr.ffi.provider.jffi.ByteBufferMemoryIO[address=0x0 size=128]

So this looks like a bug in JNR, but it seems weird that it wouldn't work correctly. Am I missing something?

hypfvieh commented 4 years ago

I'm still struggling with a proper test setup. How did you do the testing?

I noticed something in FDMessageWriter class.

On line 63 you use msghdr.allocateControl with the size of the list (e.g. 1 if only 1 FD is in the list). As far as I understand the datalength is needed here, and I don't think that one FD is only one byte in length. When comparing it to a test case of jnr-posix, I see that they are using a length 4 (4 bytes) which seems more appropriate to me (see here: jnr-posix IOTest).

Also there is already a constant for SOL_SOCKET in jnr-posix, so you may switch to that one instead of using your own (jnr.constants.platform.SocketLevel.SOL_SOCKET.intValue())

rm5248 commented 4 years ago

At the moment, I'm just trying to run the currently-existing tests, and those are failing because it can't connect to the bus correctly. The repo that I linked before also has an FDTX and FDRX classes; currently I'm focused only on sending, receiving requires more work still.

That IOTest from JNR looks useful; there's not much information on how to use JNR so I've been going around somewhat blindly.

rm5248 commented 4 years ago

I updated my branch with the fixes required to send messages at least, and all of the tests except for the file descriptor are working at the moment. I started work on the receiving end, but that requires more work still on me understanding what the spec says and what the code is doing with the InputStream in order to read the data.

hypfvieh commented 4 years ago

Nice! Keep up the great work! I'll try to help if I find some spare time to do so (and if I can finally understand the specs ...)

rm5248 commented 4 years ago

Okay, so the good news at this point: sendmsg and recvmsg are both implemented for all the current tests. I've run into a few issues at this point though.

Some of these seem like race conditions, but I'm not sure why they wouldn't have come up before.

At this point, it seems like I need to update Message.java, as simply adding a new header(UNIX_FDS) doesn't automatically add this to the marshalled message. This may be the cause of the toString failure on the Message object.

If you are able to take a look, that would be helpful.

hypfvieh commented 4 years ago

I'll take a look at that concurrent modification exception. There are some test cases which were not stable, it's about time to get this fixed.

About your toString issue: I'm not sure if the code is correct yet.

The exception is caused by trying to read an arraylist without content (e.g. filedescriptors.get(0) when list is empty). When 'fixing' this issue by checking the list of being empty first, the unit test will fail because no file descriptor is returned but null.

So my guess is, that whatever should be in the filedescriptor list is never written to it. Did not have the time to debug this, yet.

rm5248 commented 4 years ago

I messed around with this a bit more, and I just got the file descriptors to at least be returned correctly(I think). This shows up in dbus-monitor:

method call time=1575223072.038383 sender=:1.3140 -> destination=foo.bar.TestFileDescriptor serial=5 path=/TestFileDescriptor; interface=foo.bar.TestFileDescriptor; member=getFileDescriptor
method return time=1575223072.044919 sender=:1.3140 -> destination=:1.3140 serial=6 reply_serial=5
   file descriptor
         inode: 5739061
         type: fifo

The fix for this is pretty ugly though, since the header needs to be written before any of the data. To fix it properly however requires a likely significant rework of the Message class.

However, reading them doesn't work correctly at the moment; I'm unsure as to what exactly I need to do in order to read it correctly. Possibly because the reading is doing a MSG_PEEK before reading the data the control data is discarded?

hypfvieh commented 4 years ago

The fix for this is pretty ugly though, since the header needs to be written before any of the data.

Most of the code of this project is ugly anyway, so never mind.

My understanding of MSG_PEEK is, that it will look ahead in the stream for data and do not consume it, so read calls can pick it up later. That is also what the man page says: MSG_PEEK This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data.

see here

Maybe you have to read only the 12 bytes from the header and not the additional 4 bytes for the array index as this is already 'payload' and not header. So maybe those 4 bytes are missing later on (just guessing ...)

hypfvieh commented 4 years ago

I also did some testing without any usable results. For me it looks like if the control data put on the MsgHdr is getting lost or not read correctly.

If I change the size in allocateControl method to the size I would expect, I get the 'control data has been truncated' flag on the message. If it is larger sometimes the test gets stucks, sometimes I get control data , but not the data I placed into it (e.g. level == 2 instead of 1)...

rm5248 commented 4 years ago

So I think that the sending of the file descriptors is still not entirely correct. I do get an apparently valid file descriptor on my C++ side, however I cam unable to write to the file descriptor. I need to update my java/c++ examples to make this much easier to test; there's a lot of hand-editing of several files that I'm doing for testing and it's hard to keep it all straight in my mind.

On a somewhat related note, I'm doing this with Java 11, and I'm getting a warning about disallowed access for java.io.FileDescriptor(since we are constructing it via reflection). This should probably be removed once we have this finalized, since it will fail in a future version.

hypfvieh commented 4 years ago

So I think that the sending of the file descriptors is still not entirely correct. That was what I'm also thinking about.

Which size should sendMsg return when writing a message? Just the "payload" or should it also include the size of the appended control data? I always see that 52 are written, but these 52 bytes are only the payload, the control data would require additional 24 byte. Also the reading part always receives 68 bytes while only 52 were send..?

On a somewhat related note, I'm doing this with Java 11, and I'm getting a warning about disallowed access for java.io.FileDescriptor(since we are constructing it via reflection). This should probably be removed once we have this finalized, since it will fail in a future version

I'm also using Java 11 for development and saw this issue already. I don't like the reflection stuff anyway, but are there any suitable alternatives?

If you have any usable test setup, please share it so I may help as well :)

rm5248 commented 4 years ago

Okay, I've updated my test setup here: https://github.com/rm5248/dbus-java-filedescriptor

To use:

  1. Run the 'ReturnFD' java program in the 'test' package.
  2. Run the 'FDGetter' C++ application in the src/main/cpp folder. You'll need the dbus-cxx library: https://github.com/dbus-cxx/dbus-cxx

The output of the C++ application will show the following:

~/dbus-java-filedescriptor/src/main/cpp/build$ ./FDGetter 
GOT FD 7
error: Bad file descriptor

dbus-monitor shows the following:

method call time=1576426826.425504 sender=:1.3215 -> destination=test.filedescriptor serial=2 path=/; interface=test.ReturnFDInterface; member=getFileDescriptor
method return time=1576426826.474387 sender=:1.3213 -> destination=:1.3215 serial=5 reply_serial=2
   file descriptor
         inode: 5941841
         type: fifo

Here's an example of sending filedescriptors using DBus-cxx: https://github.com/dbus-cxx/dbus-cxx/tree/321f4d4a76c6c47eb0e8bf18e5018d14fa88e30f/examples/basics/filedescriptors

Looking at the data that is coming back and forth, nothing seems obviously wrong, but clearly the FD is not getting sent properly.

rm5248 commented 4 years ago

After some more investigation, I think that the sizes that are set for the ancillary data are incorrect. Output from the Java side:

msg_control=[
    cmsg {
      cmsg_len=24
      cmsg_level=1
      cmsg_type=1
      cmsg_data=java.nio.HeapByteBuffer[pos=0 lim=8 cap=8]
    }
  ],

I did a quick test on the C++ side to figure out the correct sizes, and came up with the following:

sizeof(cmsghdr->cmsg_len) is 8
sizeof(cmsghdr->cmsg_level) is 4
sizeof(cmsghdr->cmsg_type) is 4
cmsg len is 20
controllen is 24

So the cmsg_len and the data length are off by four bytes; the controllen does need to be 24 bytes in length total, due to alignment. Probably this just needs to be fixed in BaseCmsgHdr, I will check tomorrow.

rm5248 commented 4 years ago

So I did check the other day if it was the number of bytes that were incorrect, but fixing that did not seem to change anything(e.g. making sure that the cmsg length and the controllen were 20 bytes, not 24). I spent some time messing around with it, and I haven't been able to figure out exactly what is broken. Whenever I try to send the FD, I get an EBADF back, which doesn't make sense since the FD that we are using to send data on is the same one that we've been using earlier in the program.

At this point, I feel that we need an even more basic example of a unix domain socket(without dbus) just to validate that we can send FDs in the first place.

hypfvieh commented 4 years ago

I agree, first we should ensure that jnr-posix and friends is working like they should.

rm5248 commented 4 years ago

It seems that jnr-posix is not working correctly, but I have no idea why. Here's my latest; the Java and C programs should be equivalent.

The C program is from this StackOverflow response, and seems to work fine(or at least doesn't have an error) when I run. I was having issues with the fork() call in Java, so at the moment the Java and C programs simply try to send a filedescriptor to themselves. The C program doesn't get an error; the Java program always gets EBADF as the return from sendmsg when there is control data. Any thoughts?

hypfvieh commented 4 years ago

I also did some testing and created a new repo here: https://github.com/hypfvieh/test-jnr-fd-passing The C code is very ugly as I've copied it from various sources found in the internet (sorry I'm not a C programmer).

In my setup, I use some C code e.g. write to a unix socket which is read by Java code.

When java is reading and C is writing, I can see that 1 byte is written and 1 byte is read, but the java code gets stuck when trying to read the control data on the message.

When doing it the other way around, java is writing/C is reading, I don't get any control data as well. The C code detects that there is a new "client connection", and starts recvmsg call, but this never returns. This really looks like jnr-posix is doing something wrong ...

hypfvieh commented 4 years ago

Update on my tests: Reading with C and sending with java is now working like expected.

Reading with java and writing with C still gets stuck when trying to access the control data of the message...

hypfvieh commented 4 years ago

I got a working sample setup!

Sending from C to Java and vice versa works. Sending from Java to Java is also working!

Now we have to investigate what's wrong with the implementation in FDMessageReader/Writer

rm5248 commented 4 years ago

Okay so it looks like there were two problems: not calling flip on the byte buffer, and for some reason I was subtracting 1 from the FD on the CPP side. Sending file descriptors now works! I also made sure the endianess was correct, but that didn't seem to affect anything.

See: https://github.com/rm5248/dbus-java-filedescriptor/commit/9794def1ab6d437be23d93beb8755952316e27a3 https://github.com/rm5248/dbus-java/commit/0c92fc9ec52f5545bc5442eea7e949abf090a300

rm5248 commented 4 years ago

Update on the FDs: sending via the MethodReturn works, MethodCall is broken, but I think that I know what is causing it, if not how to fix it(yet).

What I'm getting on the Java side when I try to get a file descriptor as a method parameter:

[2019-12-29 14:49:55] DEBUG org.freedesktop.dbus.FDMessageReader - have 0 filedescriptors with incoming message msghdr {
  msg_name=null,
  msg_namelen=0,
  msg_iov=[
    iovec {
      iov_base=jnr.ffi.provider.jffi.DirectMemoryIO[address=0x7fb81c10bf40],
      iov_len=164,
    }
  ],
  msg_control=[
    cmsg {
      cmsg_len=28
      cmsg_level=1
      cmsg_type=2
      cmsg_data=java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]
    },
    cmsg {
      cmsg_len=32
      cmsg_level=0
      cmsg_type=0
      cmsg_data=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
    }
  ],
  msg_controllen=56
  msg_iovlen=1,
  msg_flags=0,
}

It seems that some part of jnr-posix is not doing some calculation correclty, since the two auxillary sections have lengths of 28 and 32 respectively, which does not add up to the total control length of 56. So I think that we're simply off by 4 bytes someplace, which would also conveniently explain the cmsg_len=32, as the file descriptor is number 32 after it gets passed to the second process.

rm5248 commented 4 years ago

It's working! You'll need this commit for jnr-posix, and the latest from my filedescriptor-passing branch.

Now that this works appropriately, we should be able to actually make some tests that work properly, although we will have to make sure that these updates get into jnr-posix, since that is critical for this to work properly.

hypfvieh commented 4 years ago

Congrats! Nice work!

I guess it would be best if you could create a pull request for this project and also send a pull request to the maintainer of jnr-posix. Normally he is pretty quick with merging. I have had issued a pull request because of missing SO_PASSCRED in jnr-unixsocket, this one was merged the very same day I issued the request.

rm5248 commented 4 years ago

The jnr-posix part has been merged; I'll make a clean PR in a few days to finally fix this.

rm5248 commented 4 years ago

I just went and was starting a cleanup of the code, however some of the tests are failing due to a JVM crash. Could you take a look and see if you also get a crash on your end? This looks suspiciously like an issue that I've seen before where too fast of memory allocation causes libc to crash the process.

hypfvieh commented 4 years ago

The 'StressTest' is always failing on my machine. The crash dump of the JVM points to libc.

I agree that this issue is related to memory allocation. As far as I understand ByteBuffer.allocateDirect will allocate memory outside of the JVM heap. Freeing this memory will be done by the Cleaner thread of DirectByteBuffer when the garbage collection is triggered. Garbage collection might not be triggered (or triggered too late) when running the unit tests.

Most articles I've found for DirectByteBuffer suggests that the buffer should be allocated once and be re-used (for performance and memory reason). The problem is that, except for the header read in the beginning, the size for the buffer is unknown until we have read the header. So early and one-time allocation will not work.

Manually freeing the memory is also not really supported. There are some sample codes for that, but they will not work with Java >= 9 (access to sun package is forbidden).

rm5248 commented 4 years ago

Good to know that you see that issue as well; I also see it in the stress test. I think that what has to happen at the moment is that we'll simply have to allocate enough space and then expand as needed, or else modify the API to allow for a ByteBuffer. I'll see about coding this up and see if that fixes the problem; otherwise there's something funny in JNR.

rm5248 commented 4 years ago

The good news at this point is that I can get very consistent crashes in the StressTest. The bad news is that due to the multithreaded nature of the test and the fact that it's happening with some JNR-managed code, I can't figure out where it's happening nor how to fix it at the moment.

The following error keeps being output to the surefire reports:

# Created at 2020-02-09T13:26:40.754
malloc(): unsorted double linked list corrupted

# Created at 2020-02-09T13:26:40.759
Aborted

When I've seen this before, sometimes it's been because of allocating memory too fast or calling a big function too fast(I've had it happen before where I had to declare variables that would normally be in a C function to be declared outside of the C function in order to not make glibc freak out and give this error or a similar one.)

The secondary problem as well is that this could be happening with libffi, which is part of what JNR uses to call the C functions.

Any thoughts on how we can debug this?

hypfvieh commented 4 years ago

That is really difficult. I would try to reproduce the issue without all that dbus-stuff. Starting with the unit test IOTest of jnr-posix.

I would then try to put this code in a large loop which will re-alloc the memory and stuff like the dbus-implementation does. If allocation is the issue, I would expect the test to crash like the StressTest does.

rm5248 commented 4 years ago

So the good news is that I can reproduce this and I've made a test for it: https://github.com/rm5248/jnr-posix/tree/stress-test

The bad news is that it is incredibly picky about crashing; sometimes it crashes immediately with the double-free error, sometimes the JVM will actually pick up the crash, sometimes it runs fine. I originally thought it had something to do with the toString method, but that doesn't seem to actually be the case, although that may exacerbate the issue.

hypfvieh commented 4 years ago

Uff... For me it sounds like it's time to open an issue on jnr-posix referencing your unit test and describing what you've seen. Maybe the jnr guys can help to find the issue in their code.

All of theses jnr/ffi stuff is like magic to me, I'm sorry that I can't help here...

rm5248 commented 4 years ago

Given my experience with errors with malloc/glibc before, I haven't been able to come up with a plausible way of fixing the problem that doesn't involve allocating less memory. So I've started on a new branch that just reverts back to using JNI to handle the send/recv of the data: https://github.com/rm5248/dbus-java/commit/0d8321a1afb8eea6b94b9f24b92b4f012b70ad68

My idea is that there will be multiple providers of MessageReader and MessageWriter that you can switch between, controlled by a Java environment variable, but using a sane default. Example:

java -cp *.jar -Dorg.freedesktop.dbus.MessageHandler=org.freedesktop.dbus.jniconnector....

By putting this into a separate module as well, we can keep the JNR code as-is, and you only need to bring in the JNI connector if you need the extra capability of passing the FileDescriptors. The default behavior would be to look on the classpath, and initialize the message handler with the JNI version if it exists and can load the library, else it will fallback to the current JNR version.

hypfvieh commented 4 years ago

I'm not a friend of JNI especially for libraries like dbus-java.

I've had the libmatthew JNI wrapper library before and it was really a pain to provide native compiled libraries for all those different platforms (PC/Linux/MacOS/Raspberry in different versions flavors and other IOT devices). That's why I wanted to get rid of this and why I'm not interested in adding another native library (even in a sub module or something).

I would agress to provide different implementations for MessageReader/MessageWriter, so one can choose one or write some if required. But I don't like the system property stuff. I would prefer to use the Java Service Provider interface and if there is a library providing alternative implementation of MessageReader/Writer dbus-java should use that one instead of the internal default.

That would allow me to continue the development of the current implementation and you can provide an extension which provides file descriptor stuff using JNI.

What do you think of that?

rm5248 commented 4 years ago

That sounds good to me and is generally what I was thinking. I was thinking of using the ServiceProvider as well, and that was the reasoning behind the system property: if you have multiple providers of MessageReader/MessageWriter, how do you select between them? To your point, we could simply ignore the internal version if we find at least one non-default implementation, so maybe that's not a big deal.

One potential problem that I see though is that there's two ways to implement the service: either with an InputStream/OutputStream, or by passing a file descriptor. Each implementation would need to have one or the other set, so there may be some useless methods that we need to define but that shouldn't cause a big issue.

hypfvieh commented 4 years ago

I've added a new branch which uses Java SPI (https://github.com/hypfvieh/dbus-java/tree/dbus-spi).

The service provider must provide a implementation of org.freedesktop.dbus.spi.ISocketProvider.

I thought about the methods we may need to have. I think it should be suitable to have methods to provide a reader and a writer and also to have methods which indicates if the implementation is capable of passing FDs.

For me it looks like it should be working by providing the socket object to the createReader/Writer methods. Both, the input/output streams as well as the channel/file descriptor is accessable by using the socket object.

Maybe you can have a look.

rm5248 commented 4 years ago

That looks pretty good to me - I'll work on my implementation here in the next day or two so we can test an actual implementation out.

rm5248 commented 4 years ago

Good news! I have what I believe to be a working implementation. Everything appears to be working at the moment, and I'm not getting any segfaults. There were a few minor fixes that I made to ensure that it loaded properly; see #100 for the fixes to dbus-java.

I haven't validated the sending and receiving of the file descriptors yet, but I don't expect that to be a big issue at the moment. I will need to make a test for that.

hypfvieh commented 4 years ago

Nice. I merged your changes to the spi branch. If you have finished testing, I'll merge it to master.

rm5248 commented 4 years ago

This all looks good to me at this point - I'll try to do some testing over the coming weeks, although the project that I actually needed this for may be cancelled at this point(hence why it has taken a long time to do this).

rm5248 commented 4 years ago

I finally managed to get around to finishing up the implementation on my end and upload to Maven Central: https://github.com/rm5248/dbus-java-nativefd

To use, simply add the following dependency to your pom.xml. The ServiceLoader will pick up the native implementation and use it instead of the default implementation for sending and receiving.

Note that this only works on Linux; if you try to use this on Windows, BSD, or OSX it will probably not work correctly.

<dependency>
    <groupId>com.rm5248</groupId>
    <artifactId>dbus-java-nativefd</artifactId>
    <version>1.0</version>
</dependency>

If you have issues using this dependency, open an issue on my repo and I can take a look at it.