UDP chunking - Githubissues

Kagami commented 9 years ago

Both general chunking and UDP transport docs don't explicitly mention chunking. Also telehash-udp4 transport implementation doesn't do any checks for packets size. How will it work in case of big (>1500B) packets?

(I remember that you said about current work in progress on chunking, just don't completly understand from docs is it intentional design or just not ready yet.)

quartzjer commented 9 years ago

I got chunking working successfully over TCP between C and JS and made another pass at documenting it all. I also moved the transport definitions up to a higher level and updated most of them.

UDP is a packet == datagram, there is no chunking, the chunking is only used for stream based transports and other special cases.

Feel free to re-open this and ask more questions here if I can make it any more clear, thanks!

Kagami commented 9 years ago

Hmm, maybe I don't understand something. UDP datagram can't be larger than 65535 bytes, right? And under IPv4 it's 65507 bytes maximum. Packet encoded with lob-enc can have any size (docs don't mention any). So how one could send packet with size >65507B over UDP4 transport?

quartzjer commented 9 years ago

Heh, apparently it's not mentioned anywhere in the newer e3x docs that the max message/channel wire packet is ~1400, sorry about that, it's pretty important! I'll get it updated later today :)

On Nov 16, 2014, at 10:42 AM, Kagami Hiiragi notifications@github.com wrote:

Hmm, maybe I don't understand something. UDP datagram can't be larget than 65535 bytes, right? And under IPv4 it's 65507 bytes maximum. Packet encoded with lob-enc can have any size (docs don't mention any). So how one could send packet with size >65507B over UDP4 transport?

— Reply to this email directly or view it on GitHub.

Kagami commented 9 years ago

Thanks, now it's much more clear! One more question right away: does it mean that in case of UDP transports some application-specific chunking should be implemented? Like it done in chat channel for example. Looks very low-level as for application proto in my opinion.

quartzjer commented 9 years ago

That's what the reliable channels handle for apps, do all the necessary reassembly automatically for them :)

On Nov 16, 2014, at 11:56 AM, Kagami Hiiragi notifications@github.com wrote:

Thanks, now it's much clear! One more question right away: does it mean that in case of UDP transports some application-specific chunking should be implemented? Like it done in chat channel for example. Looks very low-level as for application proto in my opinion.

— Reply to this email directly or view it on GitHub.

Kagami commented 9 years ago

Again, I don't completely understand, sorry. Docs for reliable channels don't mention chunking/segmenting at all.

For example if I set up reliable channel over UDP4 and sent message:

{
  "c":1,
  "seq":1,
  "data":"2000 bytes of data"
}

the data won't be automatically splitted into chunks. I need to do it manually in the application code on one side:

{
  "c":1,
  "seq":1,
  "data":"first 1000 bytes"
}
{
  "c":1,
  "seq":2,
  "data":"last 1000 bytes",
  "done":true
}

and then manually glue them together on the receiver. I mean, why can't we add some sort of splitting/reassembling logic right into the transport so the applications don't bother about that low-level stuff?

quartzjer commented 9 years ago

I've started updating the docs to make it more understandable where this happens, it's one of the more subtle points as multiple layers of the stack have to be involved with the size.

The e3x definition now more clearly documents that handshakes and channel packets by default should be 1400 bytes or less, but the actual logic of reliability doesn't need to know what fragment size is being used, that is still up to a higher layer to mange outside of an e3x. The current implementations have been providing a "quota" per channel packet to the app layer based on this limit though.

The telehash channels like thtp, stream, sock, etc are where the actual implementation of the data chunking/reassembly happens, I expect almost all app use cases to use these higher level constructs, the low level e3x is just a tool to enable the higher level mappings.

Where do you think it would be best to document these patterns? I'm going to re-open this issue until it's more clear :)

Kagami commented 9 years ago

Thanks, now it's clear. As for patterns: I think it would be fine to document it in channels subsection.

dchote commented 9 years ago

I agree with defining it as a subsection. I've never been a fan of the possibility of using done in some places and end in others, it would be really useful to define a general use case

Kagami commented 9 years ago

Hm, I think I understand now why we can't add chunking logic into the transports. That's because chunking also requires reliability (because we must know the order to reassemble the packets properly) and reliablity is at the higher level of hierarchy than the transports (because we may want to use reliable delivery over unreliable transports like UDP, see also #96).

Though there is another idea: why can't we make all channels reliable by default? We require encryption for everything and reliability is also a good thing. Many of the real-world application protocols use reliable channels (like HTTP, WebSocket, XMPP, SMTP, etc.). Any thoughts?

fd commented 9 years ago

@Kagami not all applications benefit from reliability. for example live video/audio streaming, game servers, (near) realtime telemetry. These applications will usually deal with packet loss by filling in the gaps using statistical methods.

quartzjer commented 9 years ago

Reliability is handled inside the encryption, so that the transports can't see or know anything about it, that's part of the design goals to remove all metadata from the wire :)

Supporting unreliable transports is also required, it's intended to work well on the internet and also seamlessly on local (mesh) networks like BLE and zigbee.

telehash / telehash.github.io

UDP chunking #100