A way to identify Timeouts

seanmonstar commented 8 years ago

A way is needed to identify which timeout is ocurring in Machine::timeout. The only way to attempt to match timeouts currently is by checking the clock and comparing with timeouts that have been set. However, this isn't precise, and is possibly an unnecessary check of the clock.

Scope::timeout_ms could either return a TimeoutId or similar, that is passed to Machine::timeout, so they can be matched, or probably the better option, is to allow timeouts to be generic over an associated type. scope.timeout_ms(Next::Read, 5_000); scope.timeout_ms(Next::Write, 1_000), for instance.

tailhook commented 8 years ago

Well, in one of the first versions of the rotor there were generic timeouts. The problem is that when there is a layering of the protocols, there are timeouts at each layer. For example, in rotor-http we have:

rotor::Machine -> rotor_stream::Stream/Protocol -> rotor_http::client::Parser/Protocol

Each of them has timeouts, and there will be timeouts in the next layers too

So at each layer, you will need to define a Timeout enum that may be either a timeout of this layer or from upstream (an associated type), and when implementing a trait you will often need a Void type. That's a little bit ugly. We have the same issue with rotor::Machine::Seed but unlike timers, seed propagation terminates on the next layer of abstraction.

Another problem is that when you create multiple timeouts per state machine, it's unclear how to configure the main loop. I mean mio has a limited number of slots for timeouts which set at loop initialization and the size is not growable. It means if you can't calculate a maximum number of timers your application will allocate your application may crash under load.

So I'm leaning towards collapsing timers at each layer to a single timeout value, and check against the clock when timeout happens.

Another issue is when you don't clear a timeout because of bug, another state machine may get this timeout. It could be because of a bug in one protocol which results to timeout getting in another protocol, which would be really hard to debug (cross-protocol bugs can happen only if "TimeoutId" is used not the associated type).

(I'm even thinking that I should go back to the strict one timeout per machine model, to make working with timers safer)

Thoughts?

seanmonstar commented 8 years ago

How would you have 1 timeout per machine? Panicking if you try to timeout_ms before the first one is used?

I appreciate that multiple timeouts per state machine may exceed the size of the timer queue. However, the size of that queue can be adjusted by the mio config.

As for needing an enum at each layer, that seems typical if you're going to be composing machines into the same loop anyways. For the chain of 'machine -> stream -> parser', you could have the timeouts at those layers be generic, or transform the types between each other.

tailhook commented 8 years ago

How would you have 1 timeout per machine? Panicking if you try to timeout_ms before the first one is used?

What I'm thinking of is:

Response::ok(self).with_timeout(x)

I.e. an event handler returning when to wake up next time. This completely resolves the problem of spurious timeouts.

I appreciate that multiple timeouts per state machine may exceed the size of the timer queue. However, the size of that queue can be adjusted by the mio config.

Sure, but how would you config it? Will you document a number of timers that hyper's state machine needs?

As for needing an enum at each layer, that seems typical if you're going to be composing machines into the same loop anyways.

Yes, currently we have 2 types when doing composition. Will need three ones. Not a big deal, though.

For the chain of 'machine -> stream -> parser', you could have the timeouts at those layers be generic, or transform the types between each other.

Sure. That may need:

enum Timeout<U> {
  MyTimeout1,
  MyTimeout2,
  UserTimeout(U),
}
impl Machine for Parser<P> {
  type Timeout = Timeout<P::Timeout>;
}

Not that it's impossible. Just complicates interface more.

What I'm worried about are spurious timeouts. When you forgot to clear timeout, either this, even some another state machine will be notified with at a wrong time. And debugging the issue will be a nightmare.

A side note: discovering such a bug may help attacker to do two bad things: (a) drop connections of other parties by connecting to the service, doing something legitimate, and then dropping own connection at some known state where the timer is not going to be cleared, or (b) just crash a server by filling up timer table

So all this contradicts to all other properties of rotor: the state machines are passed by value so it's very hard to update the state machine partially.

Returning the timeout along with state looks more like all other state machine changes work.

tailhook commented 8 years ago

And to follow up with some code, derived from your example:


fn ready(..) {
  ...
  if eventset::is_writable() {
    // .. do the work, then update a timestamp
    write_deadline = SteadyTime::now() + Duration::seconds(1)
  }
  if eventset::is_readable() {
    // .. do the work, then update a timestamp
    read_deadline = SteadyTime::now() + Duration::seconds(5);
  }
  return Response::ok(ReadWriting(..., read_deadline, write_deadline))
    .with_timeout(min(read_deadline, write_deadline))
}

fn timeout(..) {
  let now = SteadyTime::now()
  match self {
    ReadWriting(..., read_deadline, write_deadline) => {
      if now <= self.read_deadline {
        // read timeout
      } else if now <= self.write_deadline {
        // write timeout
      }
}

What I see here:

min(x, y) is simpler than keep in sync two mio::Timeout's
if now <= x { .. } else if now <= y {.. } is not substantially more complex than match t { Read => { .. } , Write => { .. } }

And to address other two comments:

The only way to attempt to match timeouts currently is by checking the clock and comparing with timeouts that have been set. However, this isn't precise,

I don't understand what "precise" means here. Current mio bumps up timer value to timer tick size (which is 200ms by default). You can never compare timers with ==, but you can definitely compare with <=, and that's not imprecise in any way.

and is possibly an unnecessary check of the clock.

Well, I'm thinking to put current "loop iteration time" into Scope object, for two reasons:

(minor) to have smaller number of time checks (but time checking is fast in linux, they are not even system calls, right?)
More importantly, this allows "time travel" in unit tests

tailhook commented 8 years ago

Ping @pyfisch and @brson, maybe you have some opinion?

seanmonstar commented 8 years ago

I see, that may work elegantly. And so if a machine is ready, the timeout is always cleared before calling ready?

On Mon, Feb 15, 2016, 6:38 PM Paul Colomiets notifications@github.com wrote:

Ping @pyfisch https://github.com/pyfisch and @brson https://github.com/brson, maybe you have some opinion?

— Reply to this email directly or view it on GitHub https://github.com/tailhook/rotor/issues/12#issuecomment-184485200.

tailhook commented 8 years ago

Sure.

(technically timeout will be cleared after ready, only if ready returns either no timeout or a different timeout as a result, but that's optimization, effect is the same)

tailhook commented 8 years ago

Okay, it's now in master. It's mostly untested, only on rotor's own examples. Currently working on getting other parts of ecosystem up to date.

seanmonstar commented 8 years ago

I tried looking through the examples for usage, but neither the echo server nor telnet seem to use timeouts. It looks like it'd be used like Response::ok(fsm).deadline(scope.now() + Duration::from_secs(30))?

tailhook commented 8 years ago

@seanmonstar yes

tailhook commented 8 years ago

@seanmonstar and you should upgrade to al least 9b48003, because the previous version contains a bug.

tailhook commented 8 years ago

I think this is solved for now. Feel free to open another issue if you feel uncomfortable with timers.

tailhook / rotor

A way to identify Timeouts #12