Binary format proposal.

carlosalberto commented 6 years ago

Uses ByteBuffer to read/write data synchronously.
write() is expected to read all passed data.
read() may need many calls in case the buffer is not big enough.
Adapter for InputStream/OutputStream.
MockTracer BINARY propagator.

coveralls commented 6 years ago

Coverage increased (+0.7%) to 82.238% when pulling 8a4372cc644d7b2080b42be47f82866f73f15960 on carlosalberto:binary_format_proposal into 242ba956368c48b804ae4120053955a49f7f9c63 on opentracing:v0.32.0.

carlosalberto commented 6 years ago

See #253

tylerbenson commented 6 years ago

@carlosalberto I haven't had time to review the whole discussion yet, but does this fully incorporate @raphw's feedback from before?

carlosalberto commented 6 years ago

@tylerbenson Some of it, but we are still not taking into account two things that were requested:

1) Not telling in advance the output size (Tracer.inject()) 2) Not over-the-wire support, but instead, in-memory buffering is expected (just like HttpHeaders and TextMap, where you write everything to some in-memory buffer and after that you pass it to the transmission and let the protocol handle it).

I'd expect, in turn:

1) Some (or a few) proof of concept implementations. 2) Early/experimental Tracers support and their feedback on this.

raphw commented 6 years ago

I can only repeate my critique of this API:

Byte buffers are meant to interact with data either on or off heap. This makes byte buffers useful if a server can allow the Tracer to directly serialize or deserialize data from for example a TCP buffer without the JVM needing to also allocate these byte on the Java heap what is expensive in terms of GC. This idea is not reflected by this API proposal and only puts additional weight onto the tracer as most implementations will look like in this example where buffers are controlled by the tracer. A NIO-friendly API must allow for providing ByteBuffers from outside of the tracer.
Byte buffers are typically used in the context of asynchrous I/O. An important factor with NIO is that an external buffer such a native TCP storage might not be able to hold all the bytes that belong to the baggage information at once. If a server, from the outside, provides a buffer to the tracer, it cannot normally guarantee that all data is already available. It would rather ask the tracer to do a partial deserialization/serialization to clear the buffer and then suspend the current channel to allow the buffer to fill. In the meantime, it would move on to the next channel from the very same thread to avoid blocking the thread. This is the core idea of NIO and selectors. With OpenTracing being a synchronous API that does not allow for such partial deserialization, the server would need to copy the data into a continous heap array and provide this array once it completed and wrapped as a byte buffer. The current API is therefore equivalent to one that is operating on byte arrays.
There is no binary contract implied. Binary protocols that allow embedding random bytes typically add a marker byte to indicate the start and end of a custom segment. This way, a server can accept baggage data from a client without even having a tracer installed by just cropping the segment. By allowing to write to byte buffers explicitly, the user of a tracer has no chance to escape any payload bytes that equal the marker byte, thus breaking the protocol. The only way to work around this would be to ask the tracer to write to an intermediate buffer and escape the data from there. This is breaking the core assumption of NIO.

None of these problems occurs by offering a stream-based approach:

Streams can read and write one byte at a time what avoids additional allocation of byte arrays. (Streams can be directed to write or read to or from a direct byte buffer without the mentioned need for intermediaries.)
Streams can easily be decorated, allowing for escaping of mentioned marker bytes.
But streams are synchronous. But since this is also true for the baggage injection and extraction APIs, this is not a problem here.

But even with streams: if the server uses a different tracer implementation than the client, none of this will work to begin with. There is no serialization format defined and if I write the data as BSON but the server expects protobuf, this would result in garbage data being reported. Even worse, this could maybe turned into an attack vector if I find a way to exploit the servers deserialization approach, maybe by forcing it to overconsume data from the buffer what corrupts the remainder of the stream what can have unwanted effects.

tylerbenson commented 6 years ago

@raphw What would you then suggest? Maybe I'm missing something, but writing instrumentation to support propagation over a generic binary interface without breaking the protocol when only one side is instrumented seems to be a significant challenge. Perhaps it would help if we had some specific use cases. What protocols/frameworks are expected to use this binary format? (perhaps protobuf, thrift, grpc?) Maybe it would be better to define those propagation interfaces specifically like we do for http, instead of lumping everything into a binary format? Which formats/protocols are more common when working with NIO? What frameworks should be explored that would highlight these problems?

(Forgive me for my ignorance. I've written a fair bit of instrumentation, but relatively little for systems without generic headers/metadata/parameters that can be used for propagation details.)

tedsuo commented 6 years ago

@raphw @tylerbenson I'd love to see the requirements for binary format expressed as a set of tests/examples. This worked very well for designing context propagation, where the requirements were similarly nuanced. If we keep discussing in english, we'll probably never resolve this. :)

raphw commented 6 years ago

Binary Protocols typically include such optional sections either by:

Disclosing the size of the segment with some meta data, e.g. by a fixed-length name and a fixed-length size indication followed by the amount of bytes disclosed. This way, if the server recieves this segment but does not know a dispatcher for such a named area, it just discards the bytes.
Starting the segment by a name and by indicating the end of the section by a marker byte. All bytes in the payload that would indicate the same byte as this marker byte are then escaped. E.g. any null byte is replaced by two null bytes within the payload. If a null byte is not followed by another null byte, it describes the end of the segment. Again, if no dispatcher is found, the payload can be discarded.

The first option requires to know the binary size in advance, the second option requires a way to manipulate any written byte before it is written to a buffer. My original suggestion using streams would allow for that.

carlosalberto commented 6 years ago

Hery everybody - sorry for the delay. Recently I took up on testing the current BINARY proposal to do tracing for Cassandra on the server side (through a plugin), and the experimental code consuming this specific BINARY support exists in https://github.com/carlosalberto/java-cassandra-server

Please feel free to test it out. Cassandra itself takes a ByteBuffer as payload for outgoing query requests, and receives a ByteBuffer from the server side as well (described in the mentioned repository containing the Cassandra plugin).

I have the impression most of the times, as the SpanContext tends to be small, the payload will be buffered and transferred in a single step.

That being said, I do realize that @raphw usage may not be supported with the current approach, and that may need an additional format (maybe something like BINARY_STREAMING).

Let me know ;)

@tedsuo @tylerbenson @yurishkuro

carlosalberto commented 6 years ago

Trying to re-take the discussion: I took a little look at how Netty works, and ended up gathering some overall issues that have been floating during the design of this format. I'm listing them here to help things get moving.

Netty and similar asynchronous frameworks for doing low-level bytes transmission (through socket/streaming connections) could be helped by exposing an InputStream/OutputStream API - however, synchronously injecting the SpanContext would work just fine, specially taking into account the context tends to be small.
In most APIs I've seen, any binary payload is specified as a specific member, not as part of a bytes stream. For example: in Cassandra that is exposed as a (single) ByteBuffer, and in Thrift you have a binary type. As you need to buffer the payload in memory before passing/setting it, a stream API wouldn't help much here.
An InputStream/OutputStream API could also let the user know how much space will be used, byte-wise (as a prior method call to do the actual write()).
An InputStream/OutputStream API would expose too many methods that would be useless for injection/extraction - mostly in the InputStream (close, mark, reset, skip, etc). This would go against a clean API to some degree.

One thing I wonder is how much time will be spent writing/doing instrumentation for low-level stuff (such as the one exposed by Netty), and how other times it will be done for http or frameworks abstracting the transmission (for these ones we don't need the stream-based API).

Thoughts? Waiting for your feedback @raphw

raphw commented 6 years ago

the more I think of it, I believe that it would actually be the easiest to extract a TextMap and push it into the stream depending on the underlying binary protocol. Until there is an actual use case, maybe it makes sense to table the binary extraction?

carlosalberto commented 6 years ago

@raphw Thanks for the feedback. Well, I'd say using OT for instrumenting Cassandra on the server side is already an actual case (albeit a simple one ;) ).

@yurishkuro Any opinion on this?

yurishkuro commented 6 years ago

I agree that this is not a pressing matter (strictly speaking "binary" can be just a text_map serialized in a way specific to the protocol being instrumented, like Cassandra), but would be nice to resolve.

carlosalberto commented 6 years ago

Closing this PR in favor of #276 (which was just merged).

opentracing / opentracing-java

Binary format proposal. #252