Open pfalcon opened 5 years ago
Implications so far:
Implication no. 2, re: Transformational vs non-transformational. So, if that optimization is to taken into account, then attempts to define some general meta-interface to implement some common proto handlers for both sync and async probably won't work. Instead, each one may need to written in more or less adhoc way, which however will allow to optimize it for traits of a particular protocol.
Thoughts re: handling handshake-like miniprotos. It's unclear what's the best to deal with these.
One good way is to be able to have detailed codes like NEED_READ, NEED_WRITE. I.e. if user calls write(), it may get NEED_READ instead, so that write goes to backlog, and user may need to queue a read operation on the underlying stream (data from which will likely be consumed internally by proto). Likewise, if read() is called, NEED_WRITE may be returned instead, notifying that read should be backlogged, and proto handler instead prepared a write record to spool to the underlying stream.
But in the current Python stream API, there're no separate NOT_READY vs NEED_READ vs NEED_WRITE return codes. There's only one - None, which originally means NOT_READY, and would need to be reused for 2 other codes. Then one would need to have additional means to query the more detailed state. For read that could be:
res = wrapper_stream.read()
if res is None:
# See if wrapper actually asks us to send out something on its behalf
if wrapper_stream.write_pending():
...
Obviously, this approach is less efficient than directly returning needed code from read(), and it's not clear how to apply that to .write() and NEED_READ, except in fully symmetrical way, by adding .read_pending() or a similar predicate.
Otherwise an obvious way to deal with async write path would be:
Given comment in https://github.com/pfalcon/micropython/issues/20#issuecomment-445541359 , this can't be made a total pattern, for non-transform miniprotos, protocol record is to be constructed/send piecewise with verbatim user data.
Also, worth stating explicitly that all operations on async wrapper streams should be optimistic, i.e. client should always first try to perform a read/write operation, and only if EAGAIN is received, should poll (the underlying) stream. That's because wrapper may have its own buffer, which may have data/space for new data, regardless of the readiness state of the underlying stream.
This is continuation of https://github.com/micropython/micropython/issues/3396 .
Definition: miniprotocol is a simplified, "single-behavior" protocol. A real world protocol is usually a combination/composition of miniprotocols.
Classification of miniprotocols:
Example: full WebSocket protocol starts with separator (line) delimited HTTP protocol, then switches to length-delimited protocol.
The idea is that length-delimited protocols are more efficient. But actually better to say that they compose better - we always know how much is left to read for the current miniprotocol, and thus can avoid over-read. The only way to have the same guarantee for separator-delimited miniproto is to read byte by byte, which is known to be not efficient. Thus, the only performance-efficient way to deal with SepProto is to buffer the underlying stream (on our side). But if this buffering happens on miniprotocol parser side, then once we need to switch to another protocol, we may have overread data, and it's a new problem how to pass that data forward to the next miniproto parser. Also, while LenProto is not susceptible to over-read problem, it doesn't mean it doesn't need buffering. For example, there can be many small records, and e.g. read "5 bytes by 5 bytes" isn't much better than SepProto's "1 byte by 1 byte". These 2 issues: a) over-read issue if buffering happens on miniproto parser side; b) need for buffering for performance in general, lead to obvious and well-known solution: buffering should happen on the input stream's side. Well, that still not ideal, if we have efficient, large-block LenProto. Consider for example that some proto start with line-based miniproto, to handle which we allocate 256 bytes buffer, and then switches to LenProto with 1K blocks. Those 256 bytes are then wasted beyond the initial line-based "handshake" phase.
Example: TLS clearly consists of initial handshake phase which requires exchange, but once that is done, single user write leads to single TLS record to send to peer, and single user read is backed by single record received from peer. One way to deal with this scenario is to perform complete handshake based on user request "open connection", so we had one-directional exchanges afterwards. But it's more complex if we postpone handshake until first actual read/write operation from user. And actually, it's more complex on TLS side too: at any time, a negotiation may be requested, which leads to the same issue that a user single operation may lead to the need to postpone it and perform series of exchanges.
Push approach has another face of over-read problem: a parser might not accept all data we feed into it, yet produce some output already. This leads to "all hands are full" situation: we need to keep track/store input data which the parser yet not accepted, and need to deal with parser output, which me may not be able to accept in one go on our side either. Pull approach simplifies that somewhat. In reality, suitability push vs pull approach depends on the nature and direction of data passing. For application data -> convert to protocol data, push approach is obvious: the app already have that data (for classically written apps), so we can only push it, hold the remaining part which miniproto didn't consume, and iterate on sending out miniproto output until it's done and we can continue with feeding the rest of app data. On protocol data -> convert to app data, pull approach is beneficial: miniproto parser knows either how much exactly it needs, or otherwise, its internal buffer requirements, and can request data to avoid over-read.
For example, WebSocket in one direction "masks" the data, but in another direction it's verbatim. Distinction here is of optimization nature: we may use "zero copy" operation, i.e. write user data directly to the underlying stream, instead of copying it to miniproto buffer for additional processing.