daveyarwood commented 4 years ago

Background

The project I'm working on requires OSC messages to be sent and received over TCP, with a client written in Go and a server written in Java. Neither go-osc nor JavaOSC currently support TCP, so as a labor of love, I've implemented TCP support in both libraries.

This implements TCP support as discussed in #12. This branch builds on top of #47, which I'm about to close because this PR starts with the commits on that branch.

The public API is the same as what we have on master today, except that now, users can set the network protocol to TCP like this:

// receiver
new OSCPortInBuilder()
  .setLocalPort(12345)
  .setNetworkProtocol(NetworkProtocol.TCP)
  .build();

// sender
new OSCPortOutBuilder()
  .setRemotePort(12345)
  .setNetworkProtocol(NetworkProtocol.TCP)
  .build();

I did a bunch of work on the tests to enable thorough testing, reusing the existing UDP tests to test using TCP. I added some additional TCP-only tests that involve sending very large payloads, which is something you can't do over UDP because there is a packet size limit (typically about 65k bytes, depending on your OS).

I also wrote simple DebugServer and DebugClient classes to facilitate testing how this branch of JavaOSC interacts with my TCP branch of go-osc, as well as testing how this branch of JavaOSC works in my project.

I've tested all the permutations of TCP vs. UDP, Go client vs. Java client, and Go server vs. Java server.

After almost two months of working on TCP support in these two libraries, I'm happy to say that everything is looking great, as far as I can tell. I haven't done any legit benchmarks, but subjectively, performance seems to be about the same as it is on master, and I don't think I've made any changes that would adversely affect performance, at least as far as I'm aware. (But if anything looks suspect, please point it out!)

Summary of changes

There are a ton of changes here, so bear with me...

Tests

I did a bunch of refactoring in OSCPortTest to make it possible to reuse the existing tests for TCP.
- I cleaned up the setUp method a bit, making it more generic and allowing the network protocol to be specified.
I fixed some timing issues involving the receivers for previous test cases inadvertently hanging around occupying the port and causing the current test case to fail.
- This is especially problematic for TCP, where you can't have multiple receivers binding the same port.
- I did two things to fix this:
- I made it so that each TCP test case uses a random available port instead of reusing ports across test cases.
- Instead of sleeping for an arbitrary duration, I wrote some utility methods like retryUntilTrue and assertEventuallyTrue that repeatedly check a condition until the desired state is reached, failing if the desired state isn't reached within a timeout period.
  
  I then used retryUntilTrue to implement waitForSocketClose, and made that part of the tear-down code that runs after each test case, thus ensuring that the receiver gets closed before the next test case begins.
I fixed similar timing issues with respect to the test cases that set up listeners and then test that a message is received.

Logging

I noticed that log output was being silenced because there is no log4j config file. The log output was useful for me as I was running tests because if a receiver thread threw an exception, I could see the stacktrace instead of the thread just dying silently.
I think we probably don't want to include a log4j.properties file in the project, because I think it might interfere with the logging configuration for people using this project.
As a solution, I added a log4j-dev.properties file that does not get picked up by default, but you can opt into using it by including the -Dlog4j.configuration=log4j-dev.properties option when you use mvn.

Module organization, transport

BREAKING Move higher-level transport.udp.OSCPort* classes up a level to transport.OSCPort*.
- Most of the code in these classes has nothing to do with UDP transport specifically, and is more about the transport of OSC messages at a higher level.
Added a Transport interface that captures the low level implementation details.
Moved the current UDP transport implementation into UDPTransport, which implements Transport.
Implemented TCPTransport.
Adjusted the toString implementations of OSCPort et al to include information about the Transport.
OSCParseExceptions now include the data we attempted to parse
- see https://github.com/hoijui/JavaOSC/commit/e2b9daffd15d5b78f105628d3e7095d0d923927d
- Allows the higher level OSCPortIn class to handle the data without needing to reach down into the transport implementation.

OSCPortInBuilder, OSCPortOutBuilder

Added a setNetworkProtocol method to OSCPortInBuilder in order to opt into using TCP.
We didn't have an OSCPortOutBuilder yet, so I created one that has the same options as OSCPortInBuilder, including setNetworkProtocol.

Serializer

Because there is no upper bound on the size of an OSC packet sent over TCP, it isn't feasible to have a fixed-size byte array or ByteBuffer and write to it, potentially overflowing if the packet is too big. That approach works for UDP because there is a packet size limit, but I had to make some adjustments to get it working with TCP without there being an arbitrary limit on the size of the packet to be serialized.
See #12 for context. I initially stated that the serializer should be stream-oriented, but after spending more time in the code, I realized that a cleaner approach would be to refactor the methods of OSCSerializer to be pure functions that return byte arrays. The public API is a method called byte[] serialize(OSCPacket). A byte array is the simplest thing that works with both the UDP and TCP transports.
Along the same lines, the contract of ArgumentHandler.serialize is now byte[] serialize(T value) instead of void serialize(ByteBuffer output, T value)

I realize this PR is huge. Let me know if there's anything I can do to make reviewing it easier!

daveyarwood commented 4 years ago

This has been sitting open for a couple months now. Is there anything I can do to make review easier?

I could perhaps try to split off small parts of it into smaller PRs that are easier to review and merge, if that would be helpful.

daveyarwood commented 4 years ago

One thing that would make this PR smaller is if we re-opened #47 (consists of the earliest commits on this branch) and had it reviewed and merged.

I can think of a bunch of other things in this big PR that I could make smaller PRs out of, if that would help:

Make serializer be byte array-oriented instead of ByteBuffer-oriented.
Misc. improvements to the tests (use random ports, fix timing issues, etc.)
Debug tools (Debug, DebugClient and DebugServer classes)
Dev-only logging config
Add OSCPortOutBuilder

hoijui commented 4 years ago

ufff sorry dave! just saw this now.. will have a look at it these days (not today anymore though).

hoijui commented 4 years ago

WOW! :O (... could't help reading ;-) ) this sounds phenomenal! thank you dave!! :-) it all sounds super good, except the serializer part of moving to byte-array, but ... I imagine this is only because I did the opposite move back then. ;-) I am sure your choice was wise, as by now you are surely more into the code-base as I am (not having been into it since quite some time). I'll have a look at the code these days!

daveyarwood commented 4 years ago

The problem is that TCP is stream-oriented, and there is no limit to the size of the OSC packet we can serialize when the protocol is TCP. When you're working with ByteBuffers, there has to be a limit.

In the serializer, we are basically concatenating a bunch of byte arrays together. In the current implementation, we have a single ByteBuffer of a fixed size (i.e. the maximum size of an OSC packet intended to be sent over UDP where there is a packet size limit), and we work our way through the objects we're serializing and add them into the ByteBuffer as we go.

That strategy will not work for TCP, because we have to pick an upper limit on the ByteBuffer when we allocate it, but we can't possibly know how large the packet is going to be; it's essentially infinite, because TCP doesn't impose a limit to the number of bytes you can send.

My initial thought was that we could use a ByteArrayOutputStream (as some of the answers to this SO question mention) as the common denominator between stream-oriented TCP and ByteBuffer-oriented UDP. But then I ran into the problem where we couldn't provide context for the bad data event.

daveyarwood commented 4 years ago

Re: the interface idea, I thought about that a little bit, but I wasn't able to figure out what kind of interface would work as a common denominator. It seemed to me like the method signatures would have to be too different, and it wasn't clear to me how to unify what we would want to do for TCP vs. UDP.

hoijui commented 4 years ago

...what kind of interface would work as a common denominator ...

What about taking the ByteBuffer interface, and removing everything that is not used in the serializer? ... I did that yesterday evening (had a flow moment). It is a very rough draft with some hacky edges, but already works, and I see nothing unsolvable. The basic idea is just as described above, and the main files are:

What do you think about that? Even if this is a bad solution for some reason, my main thinking is: I'd rather have some high-level abstraction that allows for performant, low level implementations, then to use a simple interface that makes performance improvements impossible. But.. as said, you are into this more then me right now, so I might not see some things.

hoijui commented 4 years ago

.. now thinking about it... I don't know whether my solution makes sense for the TCP scenario. ;-)

daveyarwood commented 4 years ago

I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.

I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.

On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.

hoijui commented 4 years ago

I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.

As you can see here, the BytesReceiver implementation meant to be used by TCP is not backed by a fixed-size structure. It is also mainly byte[] based, it just saves on the number of allocations/concat(enation) operations of byte[]s.

On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.

To measure the performance.. easiest thing would probably be, to use a re-parser unit test methods code, or maybe a few of them, run them many times in a performance measurement tool, and concentrate mainly on the memory usage there. an other way could be, to abstract away all the low level memory access (or at least allocation and copying of byte[]s and ByteBuffers) into our own utility functions, and log how often they are called with what size of arguments.

As I see it, your code does more allocations and copying/concatenating of the bare memory, and if it is already hard for us to figure out how much exactly, users of the library would have no chance. In the old way, and I think in this solution, the bare memory usage of these low level structures is pretty much the actual OSC packets size, as it will be, going over the network. as it is right now, it will be double that size when the BytesReceiver.toByteArray() function is called. but at least that function requires only a single (or none) allocation of a byte[], and GC call afterwards could free the "other half" of the memory.

I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.

Yes! I also saw this problem. I solved it through the PlaceHolder, the interface being in BytesReciever (an interface and a function below it), and each implementation of BytesReciever having their own implementation of PlaceHolder.

Now thinking of it ... In addition to the current implementations of BytesReciever, we could also have:

(kind of) reusable ones
one backed by a single byte[], which would be automatically replaced by a bigger one, if needed
one backed by a single byte[], which would throw an exception if it turns out to be too small
same like the above two, but with ByteBuffer as the back-storage instead of byte[]
... maybe more?

... and if this proves to be an overall good idea, we might even introduce a BytesSupplier to be used in the parser part later on. ;-)

daveyarwood commented 4 years ago

Ah, I think I may have misunderstood the overall point that you're trying to make. Is the idea that (because we suspect changing it will have an unwelcome impact on performance) we want to keep the current implementation for UDP (using a ByteBuffer of fixed size) and create a new, potentially less performant one for TCP?

I like that idea because, since JavaOSC currently does not support TCP at all, introducing it with worse performance than UDP would not be a regression! :smile:

I can't tell yet if the interface you're proposing will work or not for TCP (with an implementation of the interface that uses a List<byte[]>, but I think it probably will. Unfortunately, I think it's going to be a while before I can try it because the COVID-19 situation is leaving me with not much free time. If you'd like to take a stab at it, be my guest! Otherwise, I'll get to it when I can (potentially months from now)

hoijui commented 4 years ago

mmmmm ok :/ same here.. much less time then before. but yeah... both solutions are fine with me! you or me doing it.

hoijui commented 4 years ago

I merged byte-buffer-abstraction into your tcp-support branch after merging latest master, and the result is now in this repos tcp-support branch.

Interesting for you @daveyarwood, is probably only the merge commit.

daveyarwood commented 4 years ago

@hoijui Thanks so much for digging into this. It looks great, as far as I can tell! I did a build of origin/tcp-support with mvn install and tried it out with my project (which uses JavaOSC over TCP), and it works like a charm!

I'm going ahead and merging origin/tcp-support into this branch, and I think it'll be ready to merge, unless there are any outstanding issues?

EDIT: Merged origin/tcp-support into this branch.

hoijui commented 4 years ago

thank you @daveyarwood for the testing and your great work! :-)

daveyarwood commented 3 years ago

@hoijui Are there any blockers to getting this PR merged at this point?

hoijui commented 3 years ago

Thanks @daveyarwood for all this work, and for reminding me again! ... and sorry for the delay.

daveyarwood commented 3 years ago

No worries! Thanks for your help!

hoijui / JavaOSC

TCP support #52

Background

Summary of changes

Tests

Logging

Module organization, transport

OSCPortInBuilder, OSCPortOutBuilder

Serializer