Sphinx Packet Delay tolerance not well defined

cloehle commented 4 years ago

The Mix Network Spec includes a section about Sphinx packet processing that defines the delay:

NodeDelayCommand, specifying the delay in milliseconds to be applied to the packet, prior to forwarding it to the Node specified by the NextNodeHopCommand, as measured from the time of reception.

and

Nodes MUST discard packets that have been delayed for more time than specified by the NodeDelayCommand.

Since exact millisecond processing is impractical, the mix will give itself a time window around that NodeDelayCommand. It is my understanding that this is configured by SchedulerSlack (Default: 10ms) in the current implementation.

I think this delay tolerance should be either a specified constant or be included in the PKI document (so it can be adjusted to the other parameters of the mixnet). Either way this would have to be included in the specification.

Why? Only if that tolerance is well-defined for a particular mix network, mixes can be tested that they conform to that. Authority operators (or anyone for that matter) can then randomly test mixes with packets and see if the actual delay of a mix in the tolerance window.

Or am I missing something?

david415 commented 4 years ago

Your understanding of the mix server is correct as far as I know. I'm not sure if I agree that this should be manipulated by the PKI.

What are the changes you would like to see? You want more detailed specification in the design documents? The are many nuanced details omitted from the specs and this decision of intentional.

I understand your argument for having the delay tolerance be well defined. This cannot be a priority right now since we don't even have a mix network. But we can certainly revisit this in the future if and when it becomes an issue.

It seems like the testing of mixes should be specified as well... and furthermore we should write code to do this. But again, not a priority for this project right now.

cloehle commented 4 years ago

I agree that how to handle the delay can be thought about in the future, but I found it confusing that the spec (conveniently) omits talking about that time window. It would be weird to write that a mix can tolerate a delay without defining it further. If the window was before NodeDelayCommand, one could add

Nodes SHOULD NOT forward packets with less delay than the delay specified in NodeDelayCommand

(footnote: Such a phrase should probably be added anyway, currently this is worded weirdly (passively, see OP). Probably intentionally to ignore this problem for now?)

but the implementation allows the delay after that time, therefore currently violating the MUST quoted in the OP. I wouldn't change the implementation, because that design is the easier one, anyway. I don't have a concrete suggestion or a good idea about how to resolve this. Maybe even this issue is enough for now.

david415 commented 4 years ago

Yes you are right. The specification doesn't match the implementation... and we can change the spec to match the code. ;) But I wouldn't mind if we also put a note in the spec about how we've currently implemented it and some options for future implementations. This isn't Yawning's specification style. I rather like putting more details in the spec.

Tor uses a bandwidth authority to test new relays. Katzenpost doesn't have anything like that at all. Also adding new mixes to Katzenpost is a very manual process requiring all the authority servers to have the public identity key of the new mix. I'd also like to hear your thoughts on how we can make adding mixes automatic... although we don't need that right now.

Another thing we are missing is: performance metrics on the mix processing pipeline because the big question about Yawning's mix design is this:

Should the mix implementation have a single thread pool of workers that process the packets? Whereas the SEDA design used here has several stages of thread pools that process the packets where the idea is that each packet processing stage has the opportunity to drop packets if the pipeline is delayed for too long. This supposedly allows the quality of service to degrade more gracefully when the mixes are low on system resources.

cloehle commented 4 years ago

I think the authority question will need to be decided by real world experience. Currently all the clients don't really look like high throughput compared to tor. Web traffic will (probably) never happen. (At least the interactive Web we have today). Therefore it might turn out better (and be actually feasible) to have very few, very stable mixes with high bandwidth. Then it might be preferable to have no automatic way to add them at all. Also considering that we don't have any means to check if a mix is honest, manually adding nodes might be okay for now. We will see I guess.

cloehle commented 4 years ago

I think we can close this since #69

katzenpost / docs

Sphinx Packet Delay tolerance not well defined #67