Open raedwulf opened 8 years ago
Let me take a step back.
One thing that is troubling me over the years is the lack of progress in the network protocol space. When you look closely on what's going on it looks like only big companies are capable of implementing new protocols. There may be many reasons why random hobby developers aren't developing new protocols the same way they are developing new JavaScript libraries, but one of them is definitely the high cost of developing a network protocol. libmill/dill is ultimately meant to address that problem.
The end goal is thus to allow random Joe developer to hack for a day and come up with a new protocol implementation that's relatively bug free and has relatively good performance.
There are two subgoals required to achieve the above:
W.r.t. composability, it should be noted that two different kinds are needed:
Now let's have a look at the API design. Some thoughts, in no particular order:
Thanks for the clarifications - I did overlook a number of things in my idealised protocol-world. I was on roughly the same wavelength but it seems I assumed that composing would be helped with some degree of uniformity.
Thanks for the feedback! Hopefully that synchronises our wavelengths a bit better.
Although I posted that I think a SSL/TLS implementation in libmill is probably not the best idea, I think my code could potentially be useful for dsock in the future.
The issues I encountered with my TLS implementation was that the API initially had:
MILL_EXPORT struct mill_tlssock *mill_tlslisten_(
struct mill_ipaddr addr,
const char *cafile, const char *capath,
void *camem, size_t calen,
const char *certfile,
void *certmem, size_t certlen,
const char *keyfile, void *keymem, size_t keylen,
const char *password,
int backlog);
Which was quite burdensome as many fields were optional.
I ended up having a new setup-context structure:
MILL_EXPORT struct mill_tlsctx *mill_tlsserver_(uint32_t flags);
MILL_EXPORT struct mill_tlsctx *mill_tlsclient_(uint32_t flags);
MILL_EXPORT int mill_tlscafile_(struct mill_tlsctx *c, const char *file, const char *path);
MILL_EXPORT int mill_tlscamem_(struct mill_tlsctx *c, void *mem, size_t len);
MILL_EXPORT int mill_tlscertfile_(struct mill_tlsctx *c, const char *file);
MILL_EXPORT int mill_tlscertmem_(struct mill_tlsctx *c, void *mem, size_t len);
MILL_EXPORT int mill_tlskeyfile_(struct mill_tlsctx *c, const char *file, const char *password);
MILL_EXPORT int mill_tlskeymem_(struct mill_tlsctx *c, void *mem, size_t len, const char *password);
MILL_EXPORT const char *mill_tlserror_(struct mill_tlsctx *c);
MILL_EXPORT void mill_tlsfreectx_(struct mill_tlsctx *c);
I was wondering whether there was a more uniform to do this. For instance, maybe providing a structure-based setup with each relevant transport having a custom setup struct rather than a custom set of setup functions.
struct mill_tls_config {
const char *ca_file;
const char *ca_path;
void *ca_mem;
size_t ca_len;
const char *cert_file;
void *cert_mem;
size_t cert_len;
const char *key_file;
void *key_mem;
size_t key_len;
const char *key_password;
uint32_t flags;
};
MILL_EXPORT struct mill_tlssock *mill_tlslisten_(struct mill_tls_config *c, struct mill_ipaddr addr, int backlog);
The structure mill_tls_config
is large, so passing it by-value would be very non-standard. Here, mill_tlslisten_
would copy mill_tls_config
fields or hand off the values directly to the internal implementation to be processed.
EDIT: I just noticed that libdill, unlike libmill, does something similar with ipaddr
but it uses ipremote
as a constructor? Would that be the recommended way in libdill/dsock-based protocols? Could there be multiple constructor functions for the structure?
This does make me think dsock should just deal with the micro-protocols and not the actual composition between the protocols (apart from some utility functions).
I am still wondering whether it's possible to make a very tidy API that can compose these protocols, however in a separate optional library on-top e.g. dplumbing. This is for Joe who isn't interested in implementing his own protocol but to stick two or more protocols together with minimal API contact with the lower-level transport protocols and just have the high-level FTP/SMTP/HTTP protocols to worry about.
In terms of composability, would it be correct to think that vertical composability is possible and uniform until it reaches a protocol with protocol-specific ancilliary data (no including error conditions and also under the constraints of compatible types of protocols)?
In other words, composability is viewed from the stand-point on-the-wire. From the protocol standpoint, vertical composability has all the protocols active at any one point. Horizontal composability means that the protocols that are horizontally composed can only be switched between one another. The only exception is multiplexing - which is a special case of horizontal composability that behaves like vertical composability. A diagram would probably be really useful right now...
I just remembered what was bugging me. The current SSL and wsock implementations with libmill wrap the underlying protocols so you can't choose what protocol is used underneath. I was wondering whether it was possible to parameterise what the underlying protocol is. So you could have SSL over unix sockets, pipes etc. without having to substantially change the SSL implementation or other protocol implementation. Is this what the attach and detach functions are for?
Ok, example is better than words. Here's my idea how to do vertically-composed connection setup:
int h1 = tcp(...); // open tcp connection
int h2 = throttler(h1, ...); // limit the throughput to say 100kB/s
int h3 = crlf(h2, ...); // split lines (turns bytes into messages)
int h4 = multiplexer(h3, ...); // allows for multiple channels within the connection
int h5 = encryptor(h4, ...); // ecrypts the communication
// use h5 to send/recv now
Connection tear-down would be done in similar way but in oposite direction:
int h4 = encryptor_detach(h5);
int h3 = multiplexer_detach(h4);
...
Horizontal composability:
int h1 = tcp(...);
int h2 = wsock(h1, ...);
// exchange of wsock messages
int h1 = wsock_detach(h2);
int h3 = ssl(h1, ...)
// send/recv data
int h1 = ssl_detach(h3);
tcp_close(h1);
Some minor comments:
Ok, got few hours free, done some work.
And for a good measure, PFX protocol (messages prefixed by 64-bit size in network byte order).
And an attempt at an API RFC: https://raw.githubusercontent.com/sustrik/dsock/master/rfc/sock-api-revamp-01.txt
I've realised that there's two issues with not allowing partial sends/recvs.
Firstly, there's the minor issue where protocols, like CRLF, are inefficient because they do not know lengths beforehand. So a parsing algorithm cannot parse over the buffer easily without invoking layers of abstraction per byte. This might not be too bad as HTTP header sizes are quite small, but it'll add up if someone wants to implement a HTTP/1.1 server for 1000s of connections.
Secondly, if the protocol is handled by an external library, e.g. OpenSSL, it assumes that the wrapped functions behave in the same way as the UNIX read/write, which do allow partial sends/reads... Of course, this could be emulated - but the number of layers of abstractions traversed per byte would be overwhelming. OpenSSL has around 2 layers of abstractions for it's I/O, and then you have the extra layer of abstraction in dsock.
I've almost completed a preliminary, full-featured implementation that wraps libtls in libdill, needs testing and overcoming this problem. It uses libdill networking as opposed to the linked Libre/OpenSSL library. I'll push a branch later today on my fork, as I'm rebasing the patch on the latest changes.
I have two approaches in mind that avoid breaking the semantics of bsend and brecv:
My first approach is a new 'oracle' function:
DSOCK_EXPORT int bwait(int s, size_t *len, int64_t deadline);
Which waits on the socket until there is something to read, where it then returns the amount of data waiting in the buffer. Then subsequent brecv would can use the value of len from bwait.
There is still buffer juggling when layering byte protocols; some byte protocols will need to keep their own buffer if the number of bytes differ to that of the underlying protocol if this function is used.
Another approach would be to allow byte protocols to be chained and the processing done as new data is being read. The chain terminates when it gets converted into a message-based protocol.
This would introduce mandatory (EDIT: optional) fields in the bsock_type:
struct bsock_vfs {
struct bsock_vfs *next;
int (*bprocessv)(struct bsock_vfs *vfs, const struct iovec *iov, size_t iovlen,
int64_t deadline);
int (*bsendv)(struct bsock_vfs *vfs, const struct iovec *iov, size_t iovlen,
int64_t deadline);
int (*brecvv)(struct bsock_vfs *vfs, const struct iovec *iov, size_t iovlen,
int64_t deadline);
};
bprocessv
for the first link in the chain is NULL.
For subsequent links, bprocessv gets called whenever new data appears. In this case, fd_read
would just pass data read to the next layer of protocol without buffering it.
Thus, only end links will buffer data that can then be exposed using bsendv/brecvv.
The latter case would likely have better performance...
EDIT: After thinking a bit, bprocessv
would not be mandatory, if the protocol does not support it, it can mark bprocessv
as NULL
and revert to previous brecvv
internal behaviour without any external semantic difference.
EDIT: A better way would be to have bprocessv part of an interface and hnext
implemented in libdill to get the next handle in the list.
I've been down both the ways.
The bprocess way leads to callback hell - that's what happened in ZeroMQ - and bwait leads to a state machine hell - see nanomsg.
All in all, it seems that the only manageable way to write network code is to be purely imperative, never do "save state and return to this unfinished operation later" stuff by hand. That's what scheduler is for, after all.
Compare for example the implementation of PFX protocol in dsock:
https://github.com/sustrik/dsock/blob/master/pfx.c
with implementation of the same protocol in nanomsg:
https://github.com/nanomsg/nanomsg/blob/master/src/transports/tcp/stcp.c
As for the two problems you've described:
Actually, the technique of conflating protocols can be used whenever there's need for a super-efficient implementation, e.g. TCP+CRLF can avoid the cost of the extra function call.
Thanks! Ouch, yes that does introduce a lot of complexities.
Ok, I've added a slight optimisation to CRLF protocol. It used to do two virtual function calls per byte -- hquery and brecvv -- and now it does only brecvv. Probably unmeasurable improvement, but still.
If you want to push it even further you may consider adding special optimised code path for reading 1 byte in fd_recv().
Would there be any downside with introducing a function like libmill's
size_t tcprecvuntil(tcpsock s, void *buf, size_t len, const char *delims, size_t delimcount, int64_t deadline);
All the non-framed protocols I can think of use text - which is always delimited. So simply an optional optimised implementation of recvuntil
would solve most of the potential performance issues. It's optional because it is trivially implemented using recv
- which can be the default if recvuntil
is implemented.
I am still not sure about it. The only difference would be whether the function call is done inside the loop or outside of it. It would mean less function calls but given how tight the loop is and that probably all the code and data is in L1 cache, it hard to tell whether performance impact would be even measurable.
On the other hand, if you introduce tcprecvuntil(), you should make it generic, so it should deal with multibyte terminators (crlf) and multiple terminators (like 0x01 and '|' in FIX protocol). As a thought experiment, imagine that we wanted to be fully generic and specified the terminator as an regexp. Surely, the the cost of regexp would outweight the cost of extra function call... My point is that allowing for specialized delimiter-checking algorithm in the protocol on the top is not only more flexible, but can be also more efficient.
Finally, consider how cheap receiving one byte can be:
int tcprecvv(...) {
if(iovlen == 1 && iov[0].iov_len == 1 && rxbuf->remaining > 0) {
iov[0].iov_len[0] = rxbuf->data[rxbuf->pos];
rxbuf->pos++;
rxbuf->remaining--;
return 0;
}
...
}
Your point does make sense. I'll use the existing interface and see if I encounter any difficulties. Thanks!
Hello Martin!
I'm back from my holiday, so I have time (lots of time) to work on libmill/dill related stuff.
While thinking about the layering and composability of protocols, I sketched out some tables which I think would be useful to keep in mind for API design.
dsock/dplumbling? API
Pipeline Object
The API will orient itself around a new object called
pipeline
. Thesepipeline
objects represent a series of composed protocols that represent the end-to-end communication for a server and client.Based on the OSI model, we have up to 4 different layers in the pipeline. For example:
This is the basic form. All layers are optional; see Multiplexing. Layer 7 is a special case. The structure for `pipeline' can have more than one layer 7 for horizontal composability (protocol switches). For example, the old switching between HTTP1.1 to HTTP2.
Micro-protocol API
Layer 4, Layer 5, and Layer 6 protocols, when composed, have a similar API. Layer 6 dictates what the exact API that Layer 7 has to implement is and will be of one of two forms:
Connection-based
listen
/connect
send
recv
close
Connection-less
listen
/ not applicablesend
recv
close
/ not applicableCapabilities of Micro-Protocols
Each micro-protocol will require a number of capabilities from the previous protocol in the pipeline and exhibit a set of new capabilities for the next protocol in the pipeline. A protocol may change some of the capabilities e.g. Conn-less to Conn-based, Stream to Datagram, Unreliable to Reliable.
The endpoint field refers to the type of endpoint i.e. what can it implement/provide:
IPCs don't really fit into connection or non-connection based paradigm. There is an address which multiple writers/produces can write to. This is what I mean that they are connection-less. However, it is reasonably trivial to implement a connection-based system over these IPC primitives as well.
Shm - Shared Memory
Custom shared memory implementations for cycle squeezing. It can, in theory, implement all the other protocols in varying degrees. Some implementations are available here. Some results are here.
One benefit of using shared memory is that multicasting can be implemented with very low overhead to other processes.
Multiplexing and IP/IPC conversion
As a special Layer 5/6/7 protocol, multiplexed protocols can be mux'd or demux'd and connected to other
pipeline
objects. This allows composing across Net IP and IPC protocols.TODO
I'll update this document as discussion follows. This isn't complete... needed lunch.