Closed daveyarwood closed 3 years ago
This has been sitting open for a couple months now. Is there anything I can do to make review easier?
I could perhaps try to split off small parts of it into smaller PRs that are easier to review and merge, if that would be helpful.
One thing that would make this PR smaller is if we re-opened #47 (consists of the earliest commits on this branch) and had it reviewed and merged.
I can think of a bunch of other things in this big PR that I could make smaller PRs out of, if that would help:
ufff sorry dave! just saw this now.. will have a look at it these days (not today anymore though).
WOW! :O (... could't help reading ;-) ) this sounds phenomenal! thank you dave!! :-) it all sounds super good, except the serializer part of moving to byte-array, but ... I imagine this is only because I did the opposite move back then. ;-) I am sure your choice was wise, as by now you are surely more into the code-base as I am (not having been into it since quite some time). I'll have a look at the code these days!
The problem is that TCP is stream-oriented, and there is no limit to the size of the OSC packet we can serialize when the protocol is TCP. When you're working with ByteBuffers, there has to be a limit.
In the serializer, we are basically concatenating a bunch of byte arrays together. In the current implementation, we have a single ByteBuffer of a fixed size (i.e. the maximum size of an OSC packet intended to be sent over UDP where there is a packet size limit), and we work our way through the objects we're serializing and add them into the ByteBuffer as we go.
That strategy will not work for TCP, because we have to pick an upper limit on the ByteBuffer when we allocate it, but we can't possibly know how large the packet is going to be; it's essentially infinite, because TCP doesn't impose a limit to the number of bytes you can send.
My initial thought was that we could use a ByteArrayOutputStream (as some of the answers to this SO question mention) as the common denominator between stream-oriented TCP and ByteBuffer-oriented UDP. But then I ran into the problem where we couldn't provide context for the bad data event.
Re: the interface idea, I thought about that a little bit, but I wasn't able to figure out what kind of interface would work as a common denominator. It seemed to me like the method signatures would have to be too different, and it wasn't clear to me how to unify what we would want to do for TCP vs. UDP.
...what kind of interface would work as a common denominator ...
What about taking the ByteBuffer interface, and removing everything that is not used in the serializer? ... I did that yesterday evening (had a flow moment). It is a very rough draft with some hacky edges, but already works, and I see nothing unsolvable. The basic idea is just as described above, and the main files are:
What do you think about that? Even if this is a bad solution for some reason, my main thinking is: I'd rather have some high-level abstraction that allows for performant, low level implementations, then to use a simple interface that makes performance improvements impossible. But.. as said, you are into this more then me right now, so I might not see some things.
.. now thinking about it... I don't know whether my solution makes sense for the TCP scenario. ;-)
I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.
I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.
On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.
I don't think that would work, because for TCP you can't "back" what you're doing with a data structure that has a limit to the number of bytes you can hold in it, like either a ByteBuffer or a byte array.
As you can see here, the BytesReceiver
implementation meant to be used by TCP is not backed by a fixed-size structure.
It is also mainly byte[] based, it just saves on the number of allocations/concat(enation) operations of byte[]
s.
On another note, I'd be curious to have a better picture of how my implementation compares to the ByteBuffer-oriented one on the master branch. In what ways is it actually less efficient / performant? I wouldn't be surprised if it is less performant, but I don't really have any idea how to measure it objectively.
To measure the performance.. easiest thing would probably be, to use a re-parser unit test methods code, or maybe a few of them, run them many times in a performance measurement tool, and concentrate mainly on the memory usage there.
an other way could be, to abstract away all the low level memory access (or at least allocation and copying of byte[]
s and ByteBuffer
s) into our own utility functions, and log how often they are called with what size of arguments.
As I see it, your code does more allocations and copying/concatenating of the bare memory, and if it is already hard for us to figure out how much exactly, users of the library would have no chance.
In the old way, and I think in this solution, the bare memory usage of these low level structures is pretty much the actual OSC packets size, as it will be, going over the network.
as it is right now, it will be double that size when the BytesReceiver.toByteArray()
function is called. but at least that function requires only a single (or none) allocation of a byte[]
, and GC call afterwards could free the "other half" of the memory.
I revisited my byte array-based serializer setup on this branch just now, and I think part of the problem might be that we sometimes need to serialize part of the output (e.g. a blob argument), count the bytes of that serialized part, and then serialize that number and put it before the thing we serialized. I couldn't see a way to do that with a stream-based implementation, so working with a bunch of pure functions that return byte arrays ended up being a lot easier to work with.
Yes! I also saw this problem.
I solved it through the PlaceHolder
, the interface being in BytesReciever
(an interface and a function below it), and each implementation of BytesReciever
having their own implementation of PlaceHolder
.
Now thinking of it ...
In addition to the current implementations of BytesReciever
, we could also have:
byte[]
, which would be automatically replaced by a bigger one, if neededbyte[]
, which would throw an exception if it turns out to be too smallByteBuffer
as the back-storage instead of byte[]
... and if this proves to be an overall good idea, we might even introduce a BytesSupplier
to be used in the parser part later on. ;-)
Ah, I think I may have misunderstood the overall point that you're trying to make. Is the idea that (because we suspect changing it will have an unwelcome impact on performance) we want to keep the current implementation for UDP (using a ByteBuffer of fixed size) and create a new, potentially less performant one for TCP?
I like that idea because, since JavaOSC currently does not support TCP at all, introducing it with worse performance than UDP would not be a regression! :smile:
I can't tell yet if the interface you're proposing will work or not for TCP (with an implementation of the interface that uses a List<byte[]>
, but I think it probably will. Unfortunately, I think it's going to be a while before I can try it because the COVID-19 situation is leaving me with not much free time. If you'd like to take a stab at it, be my guest! Otherwise, I'll get to it when I can (potentially months from now)
mmmmm ok :/ same here.. much less time then before. but yeah... both solutions are fine with me! you or me doing it.
I merged byte-buffer-abstraction
into your tcp-support
branch after merging latest master
,
and the result is now in this repos tcp-support
branch.
Interesting for you @daveyarwood, is probably only the merge commit.
@hoijui Thanks so much for digging into this. It looks great, as far as I can tell! I did a build of origin/tcp-support
with mvn install
and tried it out with my project (which uses JavaOSC over TCP), and it works like a charm!
I'm going ahead and merging origin/tcp-support
into this branch, and I think it'll be ready to merge, unless there are any outstanding issues?
EDIT: Merged origin/tcp-support
into this branch.
thank you @daveyarwood for the testing and your great work! :-)
@hoijui Are there any blockers to getting this PR merged at this point?
Thanks @daveyarwood for all this work, and for reminding me again! ... and sorry for the delay.
No worries! Thanks for your help!
Background
The project I'm working on requires OSC messages to be sent and received over TCP, with a client written in Go and a server written in Java. Neither go-osc nor JavaOSC currently support TCP, so as a labor of love, I've implemented TCP support in both libraries.
This implements TCP support as discussed in #12. This branch builds on top of #47, which I'm about to close because this PR starts with the commits on that branch.
The public API is the same as what we have on master today, except that now, users can set the network protocol to TCP like this:
I did a bunch of work on the tests to enable thorough testing, reusing the existing UDP tests to test using TCP. I added some additional TCP-only tests that involve sending very large payloads, which is something you can't do over UDP because there is a packet size limit (typically about 65k bytes, depending on your OS).
I also wrote simple DebugServer and DebugClient classes to facilitate testing how this branch of JavaOSC interacts with my TCP branch of go-osc, as well as testing how this branch of JavaOSC works in my project.
I've tested all the permutations of TCP vs. UDP, Go client vs. Java client, and Go server vs. Java server.
After almost two months of working on TCP support in these two libraries, I'm happy to say that everything is looking great, as far as I can tell. I haven't done any legit benchmarks, but subjectively, performance seems to be about the same as it is on master, and I don't think I've made any changes that would adversely affect performance, at least as far as I'm aware. (But if anything looks suspect, please point it out!)
Summary of changes
There are a ton of changes here, so bear with me...
Tests
I did a bunch of refactoring in OSCPortTest to make it possible to reuse the existing tests for TCP.
setUp
method a bit, making it more generic and allowing the network protocol to be specified.I fixed some timing issues involving the receivers for previous test cases inadvertently hanging around occupying the port and causing the current test case to fail.
Instead of sleeping for an arbitrary duration, I wrote some utility methods like
retryUntilTrue
andassertEventuallyTrue
that repeatedly check a condition until the desired state is reached, failing if the desired state isn't reached within a timeout period.I then used
retryUntilTrue
to implementwaitForSocketClose
, and made that part of the tear-down code that runs after each test case, thus ensuring that the receiver gets closed before the next test case begins.I fixed similar timing issues with respect to the test cases that set up listeners and then test that a message is received.
Logging
I noticed that log output was being silenced because there is no log4j config file. The log output was useful for me as I was running tests because if a receiver thread threw an exception, I could see the stacktrace instead of the thread just dying silently.
I think we probably don't want to include a log4j.properties file in the project, because I think it might interfere with the logging configuration for people using this project.
As a solution, I added a log4j-dev.properties file that does not get picked up by default, but you can opt into using it by including the
-Dlog4j.configuration=log4j-dev.properties
option when you usemvn
.Module organization, transport
BREAKING Move higher-level
transport.udp.OSCPort*
classes up a level totransport.OSCPort*
.Added a Transport interface that captures the low level implementation details.
Moved the current UDP transport implementation into UDPTransport, which implements Transport.
Implemented TCPTransport.
Adjusted the
toString
implementations of OSCPort et al to include information about the Transport.OSCParseExceptions now include the data we attempted to parse
OSCPortInBuilder, OSCPortOutBuilder
Added a
setNetworkProtocol
method to OSCPortInBuilder in order to opt into using TCP.We didn't have an OSCPortOutBuilder yet, so I created one that has the same options as OSCPortInBuilder, including
setNetworkProtocol
.Serializer
Because there is no upper bound on the size of an OSC packet sent over TCP, it isn't feasible to have a fixed-size byte array or ByteBuffer and write to it, potentially overflowing if the packet is too big. That approach works for UDP because there is a packet size limit, but I had to make some adjustments to get it working with TCP without there being an arbitrary limit on the size of the packet to be serialized.
See #12 for context. I initially stated that the serializer should be stream-oriented, but after spending more time in the code, I realized that a cleaner approach would be to refactor the methods of OSCSerializer to be pure functions that return byte arrays. The public API is a method called
byte[] serialize(OSCPacket)
. A byte array is the simplest thing that works with both the UDP and TCP transports.Along the same lines, the contract of ArgumentHandler.serialize is now
byte[] serialize(T value)
instead ofvoid serialize(ByteBuffer output, T value)
I realize this PR is huge. Let me know if there's anything I can do to make reviewing it easier!