Clarify CyclicTimeouts API

jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more

https://eclipse.dev/jetty

Other

3.86k stars 1.91k forks source link

Clarify CyclicTimeouts API #12308

Open gregw opened 1 month ago

gregw commented 1 month ago

Jetty version(s) 12.1.x

Enhancement Description

Currently the API for CyclicTimeouts is used in two different ways.

Some usages are modelled on the CyclicTimeout class, where the schedule method needs to be called to cancel the current timeout and set a new one into the future:

Other usages just call schedule once and then mutate the value returned from getExpireNanoTime in the knowledge that by moving it into the future, it can cancel the current expire, or schedule a new one even if it has expired.

Both usages are problematic. The first usage is most similar to the style of CyclicTimeout and feels natural to use. However, because the Iterator<Expirable> is not controlled by the CyclicTimeouts class itself (other than to call remove), it is entirely possible that an Expirable to be "scheduled" without ever being passed in a call to schedule(Expirable). If it is returned by the iterator, then it can be scheduled by returning a value from getExpireNanoTime.

The second style of usage is also strange, because a call to schedule(Expirable) also doesn't really need the Expirable to be passed. That will provoke a first look at that Expirable, but it was just added to the iterator, then it would eventually be seen and scheduled regardless of any call to schedule .

Perhaps schedule(Expirable) should be renamed add(Expirable)? Or maybe more extensive API changes are necessary?

sbordet commented 1 month ago

The 3 examples you link are examples of a new entity being scheduled, so they are not about cancelling timeouts.

The new entry is scheduled not added because the collection is managed elsewhere, and the scheduling has the only purpose of letting CyclicTimeouts know when this new entity expires -- it may be sooner than any other existing entity.

So the semantic of "schedule when created", with subsequent recalculation of the earliest wakeup based on pushed-to-the-future expireNanoTime is really a schedule semantic, not add.

The other usage is when an existing entity changes its expiration, possibly earlier than expected. In this case the entity is scheduled again to allow for the recalculation of the earliest wakeup. This is done, for example, with HTTP2Session.StreamTimeouts when HTTP2Stream.setIdleTimeout() is called: we do not know when the earliest wakeup is, so we schedule again the entity to allow recalculation.

The collection of entities is managed elsewhere, and CyclicTimeouts just manages their expiration. If an entity goes away, it is removed from the collection, and won't be seen expiring. If it expires, then it should be removed from the collection, which could be done directly (by returning true from onExpired()) or indirectly (by returning false from onExpired() but then arranging for the entity to be removed from the collection).

Since the collection of entities is managed elsewhere, CyclicTimeouts needs to be informed of the entities the first time, and again only if the entity expiration could change to be earlier -- if the entity's expireNanoTime gets pushed into the future there is nothing more to do.

In summary, I think the "add" semantic is not the right one, it is really a "schedule" semantic, and I would not make changes to the APIs.

gregw commented 1 month ago

@sbordet sorry but I totally disagree. I wrote the original CyclicTimeout class and I'm confused by this API. What hope do other have.

The ability for a new entity to be scheduled via inclusion in the iterator, without schedule ever being called is non clear, nor documented nor intuitive. In HttpConnection example, it appears that the scheduled HttpChannel is recycled on the connection, so it is scheduled on every send. The scheduling is in no way associated with the iteration (which is itself creating a singleton list on every call for HTTP/1, which is a worry).

I agree add is not the right name, but neither is schedule. Perhaps reschedule?

At the very least, this class needs more javadoc.

gregw commented 1 month ago

@sbordet @lorban

the following "test" illustrates the confusion with the API:

    {
        long now = NanoTime.now();

        List<CyclicTimeouts.Expirable> entities = new ArrayList<>(List.of(
            new SimpleExpirable("one", now + 1),
            new SimpleExpirable("two", now + 2),
            new SimpleExpirable("three", now + 3)
        ));

        CountDownLatch latch = new CountDownLatch(3);
        AtomicReference<Runnable> update = new AtomicReference<>();

        CyclicTimeouts<CyclicTimeouts.Expirable> timeouts = new CyclicTimeouts<>(scheduler)
        {
            {
                update.set(this::onTimeoutExpired);
            }

            @Override
            protected Iterator<Expirable> iterator()
            {
                return entities.iterator();
            }

            @Override
            protected boolean onExpired(Expirable expirable)
            {
                System.err.println(expirable);
                latch.countDown();
                return true;
            }
        };

        //0
        //1 timeouts.schedule(entities.get(0));
        //2 timeouts.schedule(new SimpleExpirable("fake", now));
        //3 timeouts.schedule(new SimpleExpirable("fake", now + TimeUnit.SECONDS.toNanos(10)));
        //4 update.get().run();

        assertTrue(latch.await(1, TimeUnit.SECONDS));
    }

If schedule is:

not called (//0), then the test fails as nothing is scheduled.
called for only the first of the expirables (//1), then the test passes and all three expirables expire.
called for a fake expirable that expires soon (//2), then the test passes and all three expirables expire, but the fake one does not.
called for a fake expirable into the future (//3), then the test fails and nothing expires.

So this illustrates that while the class is working, the schedule call is not really correctly named nor formulated:

tasks can expire that were never "schedule" if they just appear in the iterator
task that were scheduled will not expire if they are not in the iterator
the only state taken from the passed expirable, is its expiry time, and the instance itself is ignored.

So it is really hard to know when you should call schedule:

you have to call it at least once to start the mechanism
you can often get away with never calling schedule again (as @sbordet has advised me to do in DoSHandler), but you have to "know" that:
- the iterator never becomes empty
- that a new expirable is never added with a shorter time that existing expirables.

That is hard to explain. Perhaps the method should be called "update(Expirable)" and we say that it should be called whenever an expirable changes when it is scheduled? But then the passed expirable is still mostly ignored.

Or better yet, should we just add a new method called "update()" (//4), that scans the iterator and sets the next expiry time. This method should be called whenever there is a substantive change to the collection that was not done by expiry.

lorban commented 1 month ago

@gregw I agree with you, I find this API definitely confusing and error-prone.

The fact that CyclicTimeouts doesn't own the Expirables (like the java.util collections do) is surprising. And the schedule(Expirable) method has a confusing name, as well as a convoluted contract has too many side effects IMHO.

Personally, I'd rather have an API where once Expirables have been handed to CyclicTimeouts, I'm not supposed to modify them directly. But honestly, I thought I understood what problem this class was solving and how until this issue was raised; now I'm uncertain about both.

sbordet commented 1 month ago

The collection is managed externally, that is the use case of CyclicTimeouts, e.g. exchanges in a connection, etc.

CyclicTimeouts manages the timeouts of those entities.

A reschedule() would be forced to re-iterate the whole collection to find the entity that expire earliest, verify that the current wakeup is up-to-date.

Instead, schedule(entity) tells that there is a new entity, and CyclicTimeouts only has to check that one to verify that it is not the earliest.

What would the same code do with a plain Scheduler?

It would: A) add the entity to the collection (so yes, the collection is external and not managed by the scheduler), and B) call scheduler.schedule(entityJustAdded). That is where the API came from. Just CyclicTimeouts is more efficient than a Scheduler.

Do you get confused by Scheduler too? If not, why not? Just think of CyclicTimeouts as a Scheduler and there is no confusion, it's exactly the same.

sbordet commented 1 month ago

@lorban and I discussed this:

Can CyclicTimeouts own the collection? Not really because throughout the code the collection is either derived from a Map or from a List, so we would need a MapCyclicTimeouts with put/remove and a ListCyclicTimeouts with add/remove.
Can we add a reschedule() method? We could, but it would be expensive, as it would need to go through the whole collection to find the earliest entity, while schedule(newEntity) only operates on the new entity to figure out whether it's the earliest.
Can the iterator be sorted? It could, and would make the earliest entity evident, but the cost of maintaining a sorted iterator, or the cost of sorting it is again to scan the whole collection.
Can the entity passed to schedule(newEntity) be checked if it belongs to the iterator? It could be checked, but again at the expense of a full scan of the collection. It would be racy (the entity could already be removed from the collection), but that race would be ok -- the scheduling won't be changed by such a short-lived entity.

We could rename schedule(entity) to scheduleIfEarliest(entity) or similar, but I am not sure if it's better than just improved javadocs.

CyclicTimeouts is optimized for the use cases we have, hence not that generic.

gregw commented 1 month ago

@sbordet

Do you get confused by Scheduler too? If not, why not?

Because a Scheduler doesn't:

schedule entities that have never been passed to schedule(Runnable, Duration)
not schedule some entities that have been passed to schedule(Runnable, Duration)
automagically reschedule some entities after they have expired.

Just think of CyclicTimeouts as a Scheduler and there is no confusion, it's exactly the same. It is absolutely not the same. The current CTs.schedule(Expirable) and S.schedule(Runnable, Duration) have very different semantics. It is also very different to the CyclicTimeout.schedule(Duration) method, as there is one and only one expiry for each non-cancelled call to schedule

The problem is that is very hard to write the javadoc of exactly when CT.schedule(Expirable) must be called.

Currently CT.schedule(Expirable) MUST be called when:

you know the collection was empty and you have just added an element to it; OR
you just added an element to the collection and it may be the earliest; OR
you have just updated the expiry time of an existing element so that it may expire earlier than all other elements

This can be simplified to CT.schedule(Expirable) SHOULD be called when:

you have just added an element to the collection; OR
you have just updated the expiry time of an existing element

But that results in calling CT.schedule(Expirable) from within onExpired(Expirable), which I was doing in my PR and you pointed out that it is not necessary. So perhaps it can be documented as CT.schedule(Expirable) SHOULD be called when:

you have just added an element to the collection; OR
you have just updated the expiry time of an existing element outside of the scope of a call to onExpired(Expirable)

@sbordet @lorban

What about renaming schedule(Expirable) to update(Expirable) and include an assert in the implementation:


    /**
     * Update the internal schedule of this {@code CyclicTimeouts} if the passed {@link Expirable} is the earliest expiring element of the {@link #iterator()}.
     * Should be called whenever adding a new element or mutating the expiry time outside of the scope of {@link Expirable#onExpired(Expirable)}.
     */
    public void update(T expirable)
    {
        assert StreamSupport.stream(((Iterable<T>)this::iterator).spliterator(), false).anyMatch(e -> e.equals(expirable));