Avoiding `Binary` format to be based on byte buffers

raphw commented 6 years ago

Just a small side-line note: I do not think it is a good idea to bind the Binary format onto ByteBuffers. Frameworks such as Netty are explicitly built around avoiding ByteBuffer usage as they can be rather performance heavy and uses its own ByteBuf abstraction. It would be of advantage to hooking any form of byte-cosumer / byte-producer into this API.

Also, I do not think that the average user would want to write a binary format to a buffer but rather to a stream. And even when using NIO, I cannot see how an implementor of the API could push or poll bytes asynchronously since the implementation needs to block on the tracer's inject and extract methods. I therefore think that Binary should rather be some form of callback carrier such as an input or output stream what are at least interfaces that can delegate to a byte byffer if needed without any practical performance impact.

Finally, for the current API, it is not clear to me how the return values of binary are supposed to be used. A ByteBuffer already has internal offsets which are altered during read/write. This only opens a possibility for inconsistencies.

Thanks for clarifying and considering my suggestion.

yurishkuro commented 6 years ago

@raphw are you referring to the current API in master, or to the next version here: https://github.com/opentracing/opentracing-java/blob/v0.31.0/opentracing-api/src/main/java/io/opentracing/propagation/Binary.java#L34 ?

raphw commented 6 years ago

I am refering to the next version that you linked.

yurishkuro commented 6 years ago

The next version mirrors nio stream interface. Why wouldn't it work?

For example, for inject() flow

the instrumentation creates a WritableByteChannel backed by whatever, then wraps it in BinaryAdapter
the tracer.inject() writes the trace context to a ByteBuffer (which can be pooled to avoid memory allocations) , and then writes its content to the channel

How would you like the API to look?

raphw commented 6 years ago

The ReadableByteChannel and WritableByteChannel abstractions are a concept from Java NIO for implementing asynchronous I/O. The idea is that a single thread can read/write from multiple such channels and suspend a channel if the channels buffer is exhausted to process with another channel in the mean time. This comes with the advantage that a single thread does not need to block a thread for a single I/O transaction to become available for further bytes but that it can handle multiple such transactions.

For the use case of open-tracing, the asynchronous part of the I/O cannot be applied as the inject and extract methods must complete their read/write before returning. This means that there is at most one channel to read from or to write to. Using NIO channels does therefore not bring any advantage. If this should be an option, the extract/inject methods would need to be asynchronous, too. You remain with the overhead of asynchronous processing but without any of its advantages. This is why I would advice against it.

Also, I would argue that most NIO users are actually using Netty as the de-facto standard. Netty can wrap ByteBuffers in its ByteBuf implementation and unwrap its own representation but this typically comes with a small overhead so it would be better to allow coding against a more generic interface that is not offered by ByteBuffer. As the extract/inject methods are already blocking, you can achieve all of that with programming against a simple OutputStream/InputStream which are easy to integrate with a standard socket implementation and which can easily push/pull bytes into/from a ByteBuffer/ByteBuf if any such processing was required.

Finally, ByteBuffers are often pooled for performance reasons. Netty maintains such a pool and it is a complex operation that you cannot expect from a tracer implementation to get right in a similar fashion. Also, a Tracer does not have a life-cycle, it is not closed and it might be abondoned and reinstantiated. Of course, byte buffers are neither thread-safe and this would require a thread-local management mechanism and proper options for configurating such an object pool to avoid an overuse of memory. Alternatively, byte buffers have to be created constantly what is probably a performance problem in low-level NIO environments. It would therefore be much better if the user would be able to read the bytes from a stream into a user-managed buffer that can be pooled within an application life-cycle. In the end, all bagage is availble from memory anyways and there is not normally any blocking involved when serialising or deserialising the baggage.

yurishkuro commented 6 years ago

As the extract/inject methods are already blocking, you can achieve all of that with programming against a simple OutputStream/InputStream

Isn't this just a matter of adding the respective util factories to BinaryAdapter?

I am still not clear why the Binary interface is problematic. Just because it's using ByteBuffer? Would you prefer it to use byte[] with offsets like OutputStream/InputStream? I think that would introduce a lot more methods in the interface. Using Netty's buffers is not an option, we can't make that a dependency of the API.

carlosalberto commented 6 years ago

Just FYI, io.opentracing.propagation.Adapters has overloads taking InputStream/OutputStream (which internally use Chennels.newChannel()).

For performance, I'm not sure this is a really a problem (as context should be rather small). Sounds like a premature optimization case for now (unless we have clear evidence already).

raphw commented 6 years ago

I would change the interface to:

public interface Binary {
    OutputStream getOutputStream();
    InputStream getInputStream()
}

This way, a Tracer could directly read or write bytes. A tracer is doing I/O synchronously by design. It does not make sense to use the NIO channel API for that reason. This has nothing to do with premature optimization, I argue that Binary is using the wrong API to begin with, the channels are simply not meant for this use case, they are meant for asynchronous I/O. If the I/O is synchronous, Java typically offers streams, not channels.

yurishkuro commented 6 years ago

-1 on using input/output streams directly, they are very heavy interfaces, with semantically unclear (in this context) methods like Close, when all we need to a single read and write method.

The existing interface does not use channels, an makes no claims about async behavior:

public interface Binary {
    int write(ByteBuffer buffer) throws IOException;
    int read(ByteBuffer buffer) throws IOException;
}

carlosalberto commented 6 years ago

@raphw In addition to what Yuri said, I will mention that even though we offer adapters for Channels (through Adapters), they are not the only way to go (specially if your implementation underneath doesn't need a Channel at all).

raphw commented 6 years ago

My implementation will have to allocate a ByteBuffer for any call or maintain a pool. If I could just write my bytes that I already have in memory directly to a target buffer, I do not need to allocate an intermediary. A stream would allow that if the user pumps the bytes from a stream to their buffer. With requiring the tracer to provide such a buffer, this is no longer possible. From a performance perspective, I argue that it currently makes more sense to extract a TextMap and push the strings to the buffer. In the end it is of course your call, but from a Java perspective, this is not using the standard library in the way intended and I predict that users will not enjoy working with this.

raphw commented 6 years ago

Another reason to avoid the API is that is is complex and easy to get wrong. There is a bigger chance that OpenTracing implementations do not work as expected. For example, the mock implementation does not check the bytes being read if the target is not able to consume the entire buffer: https://github.com/opentracing/opentracing-java/blob/v0.31.0/opentracing-mock/src/main/java/io/opentracing/mock/MockTracer.java#L142

carlosalberto commented 6 years ago

The MockTracer will simply throw a BufferOverflowException , which is what we want for inject(). Also observe this is not a production implementation. But in any case, if you have any feedback regarding what happens there (i.e. what to do if there's not enough space in the passed Binary) we can definitely include it (in MockTracer, that is).

Other than that, I don't think using ByteBuffers is too complex, as it has been properly documented (probably takes more time to grasp than streams, but that's about it). Or at least that's my POV ;)

raphw commented 6 years ago

It should look like this:

ByteBuffer buffer = ...
int totalBytes = 0, limitBytes = buffer.arrayOffset(), writeBytes;
do {
  writeBytes = binary.write(buffer);
} while (totalBytes + writeBytes < limitBytes);

since you mimic the asynchrous API. The Binary which is basically a buffered channel is free to only read the bytes from the buffer that are currently available in the channel buffer.

carlosalberto commented 6 years ago

Oh I see where you are coming from. Will add that definitely soon, so properly mimic this API style.

The rest of the comment still stands, though (IHMO).

raphw commented 6 years ago

Well, in the end I can only give you my advice. But I promise you that people that work with NIO in the Java space will probably not enjoy working with this API. (I am one of these people.)

yurishkuro commented 6 years ago

The mock implementation does for example miss that a target buffer might not be able to consume the whole source buffer

Why is it a MockTracer's problem? All it does is call write(buffer). Whoever implements that method can check if they have enough capacity to store the content of the buffer.

but from a Java perspective, this is not using the standard library in the way intended

I am all in favor of providing an idiomatic API, but do you consider this one non-standard if it's exactly what's used by the standard library, namely WritableByteChannel?

Btw, there are plenty of object pool libraries, and it can also be easily done with thread-locals.

carlosalberto commented 6 years ago

Why is it a MockTracer's problem? All it does is call write(buffer). Whoever implements that method can check if they have enough capacity to store the content of the buffer.

Oh, good point

raphw commented 6 years ago

Why would the implementation need to return the number of bytes read then? Also, the javadoc explicitly mentions readable byte channel and that is how the later works.

carlosalberto commented 6 years ago

Oh, that's a true one (about the implementation). As you said, we currently state so in the docs (extract):

public interface Binary {
    /**
     * Writes a sequence of bytes to this channel from the given buffer.
     * The internal buffer is expected to grow as more data is written.
     *
     * The behavior of this method is expected to be the same as WritableByteChannel.write().
     */
    int write(ByteBuffer buffer) throws IOException;

Wondering if we should change the contract, so we don't state it has to mimic the Channel way? @yurishkuro

(Observe this point should be left out of the consideration of using ByteArray vs streams ;) )

yurishkuro commented 6 years ago

Yes, we can do better with the docs, like "Writes the contents of the given buffer to the carrier."

The n of bytes returned is an interesting point. If we allow it to be less than buffer.remaining() then the tracer must loop and repeat the writes until all content of the buffer is written out, or an IO exception is thrown.

In contrast, the io.Writer in Go only returns n < len(b) if there was an error preventing the full content to be written.

tedsuo commented 6 years ago

My understanding is that ByteBuffer is the new standard, an since Java 6 APIs have preferred it. So, in the long run, ByteBuffer is the most efficient choice, because most APIs going forwards will have already put the data in a ByteBuffer. Is that incorrect?

But, separate from ByteBuffer efficiency, I am now concerned that WritableByteChannel may not be the interface we should be matching. "Bytes returned" should not be an indication that the caller should loop. More than bytes returned, I am concerned that it throws IOException. No other OT API method throws an exception, because tracing client should not handle errors in-band (there is simply nothing for the caller to do, and calling a method on a tracer should never be dangerous). InputStream (and all standard IO interfaces) will have this issue.

The requirements for this API should be the following:

1) Efficiently hand off an unknown number of bytes (practically, it will always be less than 1024 bytes). 2) Don't throw an exception.

So, really the API should perhaps look like the following:

public interface Binary {
    write(ByteBuffer buffer);
    read(ByteBuffer buffer);
}

tedsuo commented 6 years ago

Sorry, just realized I was basically recreating the old API, with the same problems regarding allocations. So there does need to be looping and returned bytes in the case that the given ByteBuffer is too small, and an IOException exception makes sense in this case to tell you that there are no more bytes but you should discard the result.

carlosalberto commented 6 years ago

So I'd propose we (in case of keeping this API):

1) Keep the looping nature of read/write() (that should let us keep the returned int) and handle the old allocation problem we had before.

2) Improve/polish documentation for BINARY, as to mention it behaves in a Channel-like fashion (see point 1) ), BUT without the requirement to be an actual Channel.

3) Apply the loop Rafael pointed out in MockTracer.

carlosalberto commented 6 years ago

So, lets say we define it this way:

public interface Binary {
    void write (byte[] buffer, int offset, int length) throws IOException;
    int read(byte[] buffer, int offset, int length) throws IOException;
}

Would this be enough? This way we wouldn't be exposing the actual stream objects (along with all their methods), and only a single method for writing/reading, while avoiding the usage of ByteBuffer. Minimalistic, and slightly conservative, but probably something that could work for everybody.

yurishkuro commented 6 years ago

why return int in this case? Essentially this is replicating the method of OutputStream, which returns void (https://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html#write(byte[],%20int,%20int)).

carlosalberto commented 6 years ago

@yurishkuro Heh, was about to edit that :) But yes, we would be replicating the signature from OutputStream and InputStream, respectively.

yurishkuro commented 6 years ago

you do need to return int from read() though

bhs commented 6 years ago

@tedsuo pointed me at this thread... I slightly prefer @carlosalberto's suggested API above to the proposed v0.31 API since it makes fewer assumptions (yet still "supports" ByteBuffer for everyday use via an adapter).

raphw commented 6 years ago

My understanding is that ByteBuffer is the new standard, an since Java 6 APIs have preferred it.

Not in general. Exposing ByteBuffers is often prefered over byte[] arrays as they allow to avoid a heap allocation if a target buffer is already available. Byte buffers might live off the heap, i.e. they can be direct and that allows Java applications to write to some I/O storage directly, avoiding an intermediate allocation that causes garbage collection. As a matter of fact, a ByteBuffer might internally represent either a byte[] array or a field of native memory (for example used for server I/O).

In this context, relying on byte-arrays is would be a step in the wrong direction and yield a less generic API than using ByteBuffers.

Of course, this variability only makes sense if the implementation wants to offer variability, With the current API, it can make sense to offer a buffer in Tracer::read where the buffer allocation is controlled by the user application where a server request can be read directly from an I/O buffer. In contrast, it makes much less sense to use a ByteBuffer on the Tracer::write API as I simply cannot imaging a tracer implementation that stores its baggage off-heap. Off-heap storage requires explicit management and the Tracer API would require an explicit life-cycle for that such as a close method. In the context of the OpenTracing API, ByteBuffers do not make any sense here.

Additionally, ByteBuffers are built primarily for supporting asynchronous I/O. But as I have pointed out, the Tracer methods are synchronous in their nature. And for this purpose, it is more appropriate to allow an implementation to stream bytes what opens for much more flexibility as they do not make an assumption on how bytes are processed besides that it is done synchronously.

I can therefore only repeat myself: channels and buffers do not superseed streams, they do a different thing. Java streams are interfaces and when I use ByteBuffers in my I/O processing, I can implement my own stream to redirect all bytes into this buffer without an intermediate storage. If I use Netty, I could write a simple stream adapter for writing to my ByteBuf and so on. And for the general use case where a user is applying synchronous I/O what is a fair amount of users, I can just direclty expose my stream. And since the inject and extract methods are synchronous to begin with, no asynchronous property is lost by modelling streams in this place.

That you can close an InputStream or OutputStream is of course unfortunate but you can make it part of the method contract that this method is never called by a tracer implementation. You can of course offer your own types that mimic streams without the method but I would prefer relying on a standard API.

As there seems to be a general misconception about what ByteBuffers do and what they are good for, I wanted to also link this tutorial for a good introduction: http://mindprod.com/jgloss/bytebuffer.html

carlosalberto commented 6 years ago

@raphw Thanks for the explanation.

So my take, after all feedback, is that providing the suggested methods using byte[] instead of ByteBuffer is a step in the middle, and the proper compromise for everybody's needs (you could even wrap your own stream around such methods).

Would you be fine with that? (That being said, I already have (locally) a patched version of MockTracer supporting this, so I will clean it up and will create a PR ;) )

carlosalberto commented 6 years ago

@raphw @yurishkuro See #245 whenever you can. Hopefully that gets us moving on this front ;)

raphw commented 6 years ago

So my take, after all feedback, is that providing the suggested methods using byte[] instead of ByteBuffer is a step in the middle [...]

No, I would argue that it is a step in the wrong direction. A ByteBuffer can be seen as a generified view of a byte-array where the mapped bytes do not need to live in the heap but can be a memory-mapped file or an TCP buffer. If you want to choose between buffers and (heap-allocated) byte arrays, choose byte buffers as they are more NIO friendly.

A step in the right direction would be the use of streams as I suggested it originally. Streams do not make an assumption about any storage format. Their limitation (and thus the need for channels) is their blocking behavior that is making single-thread selectors unfeasible when I/O is based upon such streams.

The Opentracing tracer is however not doing asynchronous I/O. It is doing synchronous reading and writing of bytes. In Java, this can be best modeled with InputStream and OutputStream.

Maybe my point of view is better understood when pointing out a sample usage in combination with the current MockTracer:

Assume that you are using NIO to send data from a client to a server. Both sides operate on TCP where the tracer is writing and reading data from TCP buffers. These buffers are expected to be accessible via socket channels and thus have limited capacity.

When the client is sending its data, the tracer is asked to write its baggage to the TCP buffer:

WriteableByteChannel tcpBuffer = ...;
tracer.inject(context, Format.BINARY, Adapters.injectBinary(tcpBuffer));

As we discussed before, I cannot expect that the TCP buffer could hold all baggage items. In many use cases, I would expect that the TCP channel is too little if the baggage is big enough. But if I only lack the last three bytes of a bagge item foobar, the server would incorrectly read foo. Therefore, the tracer has to block the call until the buffer can accept all data. In Java, one would therefore typically choose an OutputStream

Second, there is no way to communicate how many bytes were actually written to the server without explicitly marking the channel.

Let us look how this would play out at the server:

ReadableByteChannel tcpBuffer = ...;
tracer.extract(Format.BINARY, Adapters.extractBinary(tcpBuffer));

There are multiple problems:

The server tracer has no standardized way to know how much data was written. The MockTracer would now for example read all data until the channel is exhausted or until a read hits the channel before more bytes are available. (The condition being read > 0 which makes no sense, I assume this was only tested with appropriately sized byte array-backed buffers where this condition is never triggered.) This will probably read any user data sent after the trace baggage and misinterpret them as UTF-strings.
If the server is not using a tracer, it cannot know how much data was written and how many bytes it should skip.

Even though it is not mentioned in the javadoc, it seems like the responsibility of the server to know how many bytes are written by the client tracer.

The only to use this API for an NIO server would be the following:

Let the client write all data into an intermediate buffer. Count the amount of bytes in the buffer and prefix the data with a mark indicating the length of the data being sent.
From the server, pick up the mark and resolve the amount of bytes that were written by the client tracer. Slice the data into an appropriately sized byte buffer what might require multiple fillings of the TCP buffer only to limit the area readable by the tracer.

Without specifying how many bytes the client has written, I would currently not even know how to implement this API with NIO. I do not think it is possible. Has anybody tried a proof of concept?

carlosalberto commented 6 years ago

The condition being read > 0 which makes no sense, I assume this was only tested with appropriately sized byte array-backed buffers where this condition is never triggered.

From the java docs: "the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached." So, > -1 should do it then.

The point about exposing the full streams still stands: they are too heavy and too ambiguous in this context.

Another thing: it would seem that, as long as synchronous behavior is expected, we shouldn't use ByteArray.

Going back to streams: I still don't see how we couldn't use the current methods to play along streams, i.e.

1) When injecting, what else do you need from an OutputStream? Other than flush() there doesn't seem to be anything else around you'd need.

2) When extracting, besides the exposed read() method? available()? `

I'm genuinely asking to be clear on what parts you think we are missing from streams that you cannot be satisfied with the latest Binary format with its byte[] methods.

raphw commented 6 years ago

To solve the problem with the unknown byte size, I suggest the following:

interface Binary {
  OutputStream write(int size) throws IOException;  
  InputStream read() throws IOException;
}

A server can implement this somewhat like the following to stream data directly to a byte buffer. Using a stream it is also rather simple to limit the amount of data that can be read. For the sake of simplicity, I ignore buffer constrains in the example. But given the API, a NIO server can flush the byte buffer when it becomes necessary, the tracer is not forced to allocate that data in an intermediate buffer and a NIO server that does not rely on ByteBuffer can simply delegate to its own interfaces:

static class BinarySample implements Binary {

  ByteBuffer buffer;

  @Override
   public OutputStream write(int size) throws IOException {
     buffer.putInt(size);
     return new OutputStream() {
       @Override
       public void write(int b) throws IOException {
         buffer.put((byte) b);
       }
     };
   }

   @Override
   public InputStream read() throws IOException {
     // If no tracer is available, this is the amount of bytes to skip.
     int[] size = new int[]{buffer.getInt()};
     return new InputStream() {
       @Override
       public int read() throws IOException {
         if (size[0]-- > 0) {
           return buffer.get();
         } else {
           return -1;
         }
       }
       @Override
       public void close() throws IOException {  // Skip bytes not read by the tracer. 
         buffer.position(buffer.position() + remainder);
       }
     };
   }
}

This if of course just a blueprint. I would not see how this can be achieved by a NIO server with the current APIs. For some NIO servers, it might also be meaningful to get a callback once all bytes are written, the close methods do therefore make sense. For example, a server might want to buffer a specific amount of bytes and send those as fixed-sized chunks. I would therefore require the tracers to close their streams to offer a richer interaction space.

carlosalberto commented 6 years ago

OutputStream write(int size) throws IOException;

We won't want to have to know in advance how much data we will be sending/writing (this was the original discussion that led to the current format). If some authors want to let know (like you did in the example) that's fine, but we still want to support the variable size thing.

The only think I can imagine, down the road, is to have something like:

interface Binary {
    void write(byte[] b, int off, int len); // remove?
    int read(byte[] b, int off, int len); // remove as well?
    void write(int byte);
    int read();
}

(Since allocation seems to be 'another' problem here?)

raphw commented 6 years ago

But without this information, how can the server decide how much bytes belong to the tracer? The tracer must know how many bytes it can read and if no tracer is available on the server, it must know how many bytes to skip. The tracer does not control the exchange format.

I have worked with NIO quite a bit and I would not know how to use this. You cannot consume an entire channel unless the tracer data is sent at the very end of a request what seems unlikely for most binary protocols. Did you do a proof of concept for this?

yurishkuro commented 6 years ago

My preference is still to use ByteBuffer. I don't understand the sentiment of "ByteBuffers are built primarily for supporting asynchronous I/O." To me, ByteBuffer is simply a more convenient way of passing byte arrays around, by allowing operations on the slices of those arrays without passing the additional offset+length, plus abstracting the heap allocations when using direct buffers.

The async discussion keeps going over my head. Unless we're suggesting returning a Future from the write/read methods of the carrier, we are always doing sync operation. It doesn't matter if the implementation of the carrier internally uses async, all it is told is write(buffer). If it takes more then one write to the underlying tcp channel - so be it, that's not the tracer's concern.

interface Binary {
    // * The carrier must write full content of the buffer to the outbound request.
    // * The tracer may call this method multiple times, e.g. if it uses buffers that do not fit all baggage
    void write(ByteBuffer buf) throws IOException;

    // * The carrier must read up to buf.length() from the inbound request.
    // * It must return the number of bytes actually read, in case the payload.length < buffer.length
    // * The tracer may call this method multiple times.
    int read(ByteBuffer buf) throws IOException;
}

What specifically cannot be accomplished with the above interface?

yurishkuro commented 6 years ago

how can the server decide how much bytes belong to the tracer?

I think I see where you're coming from. Let me think about this.

carlosalberto commented 6 years ago

The tracer must know how many bytes it can read and if no tracer is available on the server, it must know how many bytes to skip. The tracer does not control the exchange format.

I think that currently we are taking for granted we are buffering in-memory the result of injection() and then sending it over (like we do for TextMap/HttpHeaders), not as-you-go, which seems to be what you want/need to do - thus the need for providing the size beforehand?

raphw commented 6 years ago

Remember that the server and client are two different machines. Neither side knows if the other side is running a tracer. The client needs to mark the segment that contains the tracer bytes and the server needs to detect these bytes or detect that no such bytes are sent. In order to mark the segment, the client needs to somehow communicate this information to the server and for this to being possible, the client needs to know from the tracer how many bytes are written. This usually needs to happen as a header, not as a footer, as the client cannot use a marker byte as the same byte could be written by the client. Some escaping mechanism could be used but efficiency and reliability-wise, I would not recommend that.

carlosalberto commented 6 years ago

Some escaping mechanism could be used but efficiency and reliability-wise, I would not recommend that.

Yeah, I'd be against it as well, as that's more of an implementation detail.

yurishkuro commented 6 years ago

When the tracer in the server reads the bytes, it has no way of knowing how much it needs to read. So I'd argue that writing the header with the total size on the client and reading that header on the server should be the responsibility of the protocol (or carrier), not the tracer.

To allow the client to know the size, void write(ByteBuffer buf) throws IOException; API still works, but the tracer is not allowed to call it more than once.

On the server side, the tracer may either allocate the buffer large enough so that protocol can store incoming bytes in one go, in which case int read(ByteBuffer buf) throws IOException; works provided it's called only once and exception is thrown by the carrier of the payload size is larger than the buffer, or the semantics can be as I described before, with tracer allowed to call it multiple times until returned value is < buffer.capacity().

carlosalberto commented 6 years ago

... I'd argue that writing the header with the total size on the client and reading that header on the server should be the responsibility of the protocol (or carrier), not the tracer.

I was thinking the same, that it should be responsibility of the protocol, not the tracer.

To allow the client to know the size, void write(ByteBuffer buf) throws IOException; API still works, but the tracer is not allowed to call it more than once.

I'd go for more flexibility, but I understand this need. If we go for the single call scenario, I'd stick to the same for read(), for uniformity purposes.

yurishkuro commented 6 years ago

I agree that having a single call in each direction would be ideal, but in case of read the tracer really has no idea how big the buffer should be. That's not to say it's all lost, because a sensible encoding would be to write the trace context first (usually known or bounded length), which is the most important to preserve on the receiving side. Then depending on the encoding, the baggage can also be potentially written with size prefixes so that foo=bar will not get truncated to foo=b due to buffer size, but rather dropped completely by the tracer. Typically the same thing will happen on the injection as well, since if the tracer has to pre-allocate the buffer, a reasonable implementation would be to have the size user-configurable and drop baggage that doesn't fit.

Another option to keep a single read but flexible length is to have another method that returns the size which would be called first.

Yet another option: ByteBuffer read()

raphw commented 6 years ago

Giving opportunity to drop data based on some tracer-implementation-specific buffer size sounds dodgy. And why would the amount of baggage need to be fixed-size? This way, if no baggage is sent, what I assume is the default case, the client and server have to still stream an empty space of null-bytes to fullfil the protocol contract. If a lot of baggage is sent, some of it is lost. That is the kind of thing that works in a test environment but not in production. That does not sound practical.

Did you try implementing a proof of concept with for example a Netty client and server? I did and I am still not convinced that the suggested API can be used properly.

yurishkuro commented 6 years ago

I was not suggesting reserving space on the wire, just memory buffers. Limiting the total overhead of baggage on the wire is not an unreasonable design, especially if it's configurable.

I have not tried implementing this with Netty. Can you share your implementation?

carlosalberto commented 6 years ago

I'm wondering if we should, after 'fixing' the documentation (and making clear whether one or multiple calls are allowed), ship this format and mark it as experimental. Once that happens, we can get concrete feedback and update it as/if needed.

yurishkuro commented 6 years ago

I'd be ok with that. To be frank, the need for binary format is a bit theoretical to begin with. Most Jaeger clients don't even support it yet, since nobody asked for it. E.g. the only use case I know of is the way Envoy encodes trace context, but it's not in Java. Would be very interesting to list the protocols that would need binary format, i.e. protocols that do not support string key-value application headers, yet allow an arbitrary binary blob (???).

carlosalberto commented 6 years ago

I'd be ok with that. To be frank, the need for binary format is a bit theoretical to begin with.

Good point.

So since Rafael's needs are very specific (prior need to know the size, streaming over the wire instead of buffering into memory first, etc), it sounds like it may as well need a different format altogether, such as BINARY_STREAM (which could be added eventually and as/if needed).

To me it sounds, then, we could keep the Binary support we have in-right now (with a few fixes/clarifications) and merge #246 (or similar changes ;) )

tedsuo commented 6 years ago

Thanks @carlosalberto.

@raphw we do need to get v0.31 out the door. The assumption with the current ByteBuffer approach is that the data is already buffered in some manner. Given that caveat, what do you think about it? A number of people have looked at it and approved it for this use case, I'd rather not change it unless it's genuinely broken or there's a clear improvement that could be made.

The assumption is that streaming use cases works fine with the current API via an adapter, just with a small amount of overhead. I'd rather have a longer conversation about a zero-cost streaming API which could show up in the next version as a backwards compatible change, such as an additional set of methods or another format.

opentracing / opentracing-java

Avoiding `Binary` format to be based on byte buffers #243