hoijui / JavaOSC

OSC content format/"protocol" library for JVM languages
http://www.illposed.com/software/javaosc.html
BSD 3-Clause "New" or "Revised" License
156 stars 43 forks source link

TCP support #52

Closed daveyarwood closed 3 years ago

daveyarwood commented 4 years ago

Background

The project I'm working on requires OSC messages to be sent and received over TCP, with a client written in Go and a server written in Java. Neither go-osc nor JavaOSC currently support TCP, so as a labor of love, I've implemented TCP support in both libraries.

This implements TCP support as discussed in #12. This branch builds on top of #47, which I'm about to close because this PR starts with the commits on that branch.

The public API is the same as what we have on master today, except that now, users can set the network protocol to TCP like this:

// receiver
new OSCPortInBuilder()
  .setLocalPort(12345)
  .setNetworkProtocol(NetworkProtocol.TCP)
  .build();

// sender
new OSCPortOutBuilder()
  .setRemotePort(12345)
  .setNetworkProtocol(NetworkProtocol.TCP)
  .build();

I did a bunch of work on the tests to enable thorough testing, reusing the existing UDP tests to test using TCP. I added some additional TCP-only tests that involve sending very large payloads, which is something you can't do over UDP because there is a packet size limit (typically about 65k bytes, depending on your OS).

I also wrote simple DebugServer and DebugClient classes to facilitate testing how this branch of JavaOSC interacts with my TCP branch of go-osc, as well as testing how this branch of JavaOSC works in my project.

I've tested all the permutations of TCP vs. UDP, Go client vs. Java client, and Go server vs. Java server.

After almost two months of working on TCP support in these two libraries, I'm happy to say that everything is looking great, as far as I can tell. I haven't done any legit benchmarks, but subjectively, performance seems to be about the same as it is on master, and I don't think I've made any changes that would adversely affect performance, at least as far as I'm aware. (But if anything looks suspect, please point it out!)

Summary of changes

There are a ton of changes here, so bear with me...

Tests

Logging

Module organization, transport

OSCPortInBuilder, OSCPortOutBuilder

Serializer


I realize this PR is huge. Let me know if there's anything I can do to make reviewing it easier!

daveyarwood commented 4 years ago

This has been sitting open for a couple months now. Is there anything I can do to make review easier?

I could perhaps try to split off small parts of it into smaller PRs that are easier to review and merge, if that would be helpful.

daveyarwood commented 4 years ago

One thing that would make this PR smaller is if we re-opened #47 (consists of the earliest commits on this branch) and had it reviewed and merged.

I can think of a bunch of other things in this big PR that I could make smaller PRs out of, if that would help:

hoijui commented 4 years ago

ufff sorry dave! just saw this now.. will have a look at it these days (not today anymore though).

hoijui commented 4 years ago

WOW! :O (... could't help reading ;-) ) this sounds phenomenal! thank you dave!! :-) it all sounds super good, except the serializer part of moving to byte-array, but ... I imagine this is only because I did the opposite move back then. ;-) I am sure your choice was wise, as by now you are surely more into the code-base as I am (not having been into it since quite some time). I'll have a look at the code these days!

daveyarwood commented 4 years ago

The problem is that TCP is stream-oriented, and there is no limit to the size of the OSC packet we can serialize when the protocol is TCP. When you're working with ByteBuffers, there has to be a limit.

In the serializer, we are basically concatenating a bunch of byte arrays together. In the current implementation, we have a single ByteBuffer of a fixed size (i.e. the maximum size of an OSC packet intended to be sent over UDP where there is a packet size limit), and we work our way through the objects we're serializing and add them into the ByteBuffer as we go.

That strategy will not work for TCP, because we have to pick an upper limit on the ByteBuffer when we allocate it, but we can't possibly know how large the packet is going to be; it's essentially infinite, because TCP doesn't impose a limit to the number of bytes you can send.

My initial thought was that we could use a ByteArrayOutputStream (as some of the answers to this SO question mention) as the common denominator between stream-oriented TCP and ByteBuffer-oriented UDP. But then I ran into the problem where we couldn't provide context for the bad data event.

daveyarwood commented 4 years ago

Re: the interface idea, I thought about that a little bit, but I wasn't able to figure out what kind of interface would work as a common denominator. It seemed to me like the method signatures would have to be too different, and it wasn't clear to me how to unify what we would want to do for TCP vs. UDP.

hoijui commented 4 years ago

...what kind of interface would work as a common denominator ...

What about taking the ByteBuffer interface, and removing everything that is not used in the serializer? ... I did that yesterday evening (had a flow moment). It is a very rough draft with some hacky edges, but already works, and I see nothing unsolvable. The basic idea is just as described above, and the main files are:

What do you think about that? Even if this is a bad solution for some reason, my main thinking is: I'd rather have some high-level abstraction that allows for performant, low level implementations, then to use a simple interface that makes performance improvements impossible. But.. as said, you are into this more then me right now, so I might not see some things.

hoijui commented 4 years ago

.. now thinking about it... I don't know whether my solution makes sense for the TCP scenario. ;-)

daveyarwood commented 4 years ago

I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.

I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.

On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.

hoijui commented 4 years ago

I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.

As you can see here, the BytesReceiver implementation meant to be used by TCP is not backed by a fixed-size structure. It is also mainly byte[] based, it just saves on the number of allocations/concat(enation) operations of byte[]s.

On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.

To measure the performance.. easiest thing would probably be, to use a re-parser unit test methods code, or maybe a few of them, run them many times in a performance measurement tool, and concentrate mainly on the memory usage there. an other way could be, to abstract away all the low level memory access (or at least allocation and copying of byte[]s and ByteBuffers) into our own utility functions, and log how often they are called with what size of arguments.

As I see it, your code does more allocations and copying/concatenating of the bare memory, and if it is already hard for us to figure out how much exactly, users of the library would have no chance. In the old way, and I think in this solution, the bare memory usage of these low level structures is pretty much the actual OSC packets size, as it will be, going over the network. as it is right now, it will be double that size when the BytesReceiver.toByteArray() function is called. but at least that function requires only a single (or none) allocation of a byte[], and GC call afterwards could free the "other half" of the memory.

I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.

Yes! I also saw this problem. I solved it through the PlaceHolder, the interface being in BytesReciever (an interface and a function below it), and each implementation of BytesReciever having their own implementation of PlaceHolder.

Now thinking of it ... In addition to the current implementations of BytesReciever, we could also have:

... and if this proves to be an overall good idea, we might even introduce a BytesSupplier to be used in the parser part later on. ;-)

daveyarwood commented 4 years ago

Ah, I think I may have misunderstood the overall point that you're trying to make. Is the idea that (because we suspect changing it will have an unwelcome impact on performance) we want to keep the current implementation for UDP (using a ByteBuffer of fixed size) and create a new, potentially less performant one for TCP?

I like that idea because, since JavaOSC currently does not support TCP at all, introducing it with worse performance than UDP would not be a regression! :smile:

I can't tell yet if the interface you're proposing will work or not for TCP (with an implementation of the interface that uses a List<byte[]>, but I think it probably will. Unfortunately, I think it's going to be a while before I can try it because the COVID-19 situation is leaving me with not much free time. If you'd like to take a stab at it, be my guest! Otherwise, I'll get to it when I can (potentially months from now)

hoijui commented 4 years ago

mmmmm ok :/ same here.. much less time then before. but yeah... both solutions are fine with me! you or me doing it.

hoijui commented 4 years ago

I merged byte-buffer-abstraction into your tcp-support branch after merging latest master, and the result is now in this repos tcp-support branch.

Interesting for you @daveyarwood, is probably only the merge commit.

daveyarwood commented 4 years ago

@hoijui Thanks so much for digging into this. It looks great, as far as I can tell! I did a build of origin/tcp-support with mvn install and tried it out with my project (which uses JavaOSC over TCP), and it works like a charm!

I'm going ahead and merging origin/tcp-support into this branch, and I think it'll be ready to merge, unless there are any outstanding issues?

EDIT: Merged origin/tcp-support into this branch.

hoijui commented 4 years ago

thank you @daveyarwood for the testing and your great work! :-)

daveyarwood commented 3 years ago

@hoijui Are there any blockers to getting this PR merged at this point?

hoijui commented 3 years ago

Thanks @daveyarwood for all this work, and for reminding me again! ... and sorry for the delay.

daveyarwood commented 3 years ago

No worries! Thanks for your help!