rust-lang / rls

Repository for the Rust Language Server (aka RLS)
Other
3.51k stars 257 forks source link

Abandon long-running & invalidated client requests #540

Closed alexheretic closed 6 years ago

alexheretic commented 6 years ago

Looking at my atom logs today (and cpu monitor) it looks like we could do a little more in rls to handle different client behaviours.

atom log screenshot

Rust (RLS) rpc.sendRequest textDocument/documentHighlight received (1386549ms) []
Rust (RLS) rpc.sendRequest textDocument/documentHighlight received (1373672ms) []
... etc

Firstly it may well be the case that the atom client is spamming rls unnecessarily and I believe I've heard rumblings about improving this client-side. Regardless, I believe we can also do better server-side.

Two improvements jump to mind, but please correct me if I'm misinterpreting.

I'm assuming we're actually able to cancel these tasks.

nrc commented 6 years ago

rls should cancel user-is-waiting type tasks after a certain max-time has passed (configurable?)

This should be possible, and not too hard. We already time-limit some tasks in fact. E.g., https://github.com/rust-lang-nursery/rls/blob/master/src/actions/requests.rs#L254

rls should be aware of atomic requests, ie on receiving a second textDocument/completion we should cancel any previous textDocument/completion request (unless identical).

This is probably more difficult. I'm not sure how we would implement it.

Xanewok commented 6 years ago

In general, according to LSP protocol, unless explicitly asked for ($/cancelRequest), the server cannot just cancel a request and not reply (protocol) - we'd have to respond with a custom error code (JSON-RPC error codes). I don't think that's also particularly good, as this should signify an actual error and UX experience in some cases might be even worse if the server decides to cancel multiple subsequent requests and client will loudly notify about all those 'errors'.

In this particular example what probably needs fixing is internal request process ordering and improving responsiveness for non-mutating requests

alexheretic commented 6 years ago

Yes by cancelling a request task I mean we return an error or empty response.

I would have thought that receiving, for example, completion request a then shortly afterwards another different completion request b would mean:

In this case I would suggest cancelling rls task a & responding immediately with an error or empty response (an empty or incomplete response seems a good fit for a redundant completion request). I'd expect the client to discard the result anyway. This means rls can start b immediately/unaffected by already running similar tasks.

The principle seems solid so I'll investigate a little to see if this kind of change could make a positive impact in rls.

alexheretic commented 6 years ago

I think our threading usage in requests is a little misleading.

// stripped down threading from src/actions/requests.rs
fn handle(...) {
    let t = thread::current();

    let rustw_handle = thread::spawn(move || {
        let work = POTENTIALLY_EXPENSIVE_WORK();
        t.unpark();
        work
    });

    thread::park_timeout(Duration::from_millis(::COMPILER_TIMEOUT));

    let result = rustw_handle.join().unwrap_or_else(|_| DEFAULT_RESULT);
    Ok(result)
}

I read this as

Which is a strange thing to do as we're just blocking until the task is done and using thread as a panic catcher, which essentially equivalent to

fn handle(...) {
    let result = panic::catch_unwind(move || POTENTIALLY_EXPENSIVE_WORK());
    Ok(result.unwrap_or_else(|_| DEFAULT_RESULT))
}

Importantly the threading does not allow an early return value if the thread takes too long. If we want to do that we should use a channel.

nrc commented 6 years ago

Yeah, that looks like a mistake. We sometimes use both the compiler and Racer on different threads which is why we use threads for this, but I think mostly now we only use threads for the timeout, which as you point out doesn't actually work :-s

I think the first thing to do is to make the timeout work properly.

alexheretic commented 6 years ago

Right #552 should address the timeout actually working.

However, it doesn't address the 100-requests-in-a-short-time problem.

As the stdin-reading main thread is blocked during requests, it picks up the request-2 say 0.5 seconds later (after finishing req-1) but cannot distinguish this request from a freshly made one. So it processes it, 0.5 seconds later the next. Etc. In this scenario rls blocks all requests for 50 seconds handling this request spam, and the timeouts don't help (merely cap each at 1.5s).

I think we need to unblock a stdin-handling thread so it can dispatch and timestamp all requests coming in without delay. That way even a sequential blocking request handler can look at the request timestamp to determine timeouts. So in the above scenario after 3 or so requests are processed we'll see that we are greater than the total timeout from the request-start and just immediately return default results for the rest.

And indeed we can go further now at the start of the request we could look at the request queue, maybe decide actually there is a more recent "completion" request => i'll just respond default for this one and skip to the latest. ('skip' here meaning outputting default responses for all matching requests sent before the latest one)

With the great tools for fearless concurrency that rust gives us I just think we can do a lot better than a main thread blocking dispatcher. Would you be interested in a PR to that effect?

nrc commented 6 years ago

I think we need to unblock a stdin-handling thread so it can dispatch and timestamp all requests coming in without delay

So, we used to do something like this, basically spawn a new thread for each request, but hold them in a queue so they don't all run at once. It was kind of complicated and the single-threaded model is much nicer. I think we can probably do better than both, but I want to be conscious of any added complexity.

I think what you're proposing (and could work) is a two-threads model, where there is one request-processing thread (reads from stdin, parses requests, enqueues them, manages the queue) and one working thread (dequeues requests and processes them). Queue management might include 'cancelling'/'skipping' requests (by responding default or whatever). However, we must be careful about request/response ordering (requests must always be handled in order, though I don't think we need to be respond in order (but we should check the LSP)). If both threads can respond, we also need to be careful to lock on stdout so we responses don't clash. Does that sounds like what you're proposing?

If so, I would be interested in a PR :-) However, I have found this work subtle and tricky in the past, so please test carefully.

alexheretic commented 6 years ago

Yes I agree the two-thread model is a good start, as it can fix my scenarios by providing an accurate received Instant and perhaps access to the queue of pending requests.

Actually the LSP wants responses to be returned in the same order as requests, though it also sheepishly suggests The server is also allowed to reorder requests and notification if the reordering doesn't affect correctness.

requests must always be handled in order

I thought we'd actually be ok to execute requests concurrently / out of order as long as we ensured we responded in order. Why do requests need to be handled in order?

In any case the 2-thread design would handle requests sequentially and respond in order, so seems to be a win. I take a look at that approach.

nrc commented 6 years ago

Why do requests need to be handled in order?

One example is two 'did change' events inserting a then b, if we handle them out of order we might get ba rather than ab. If we get a mutation then a query, if we handle them out of order then the query won't take into account the mutation. That might be ok (just a minor quality degradation) or it might cause an error (e.g., if we miss a reference for a rename). There are plenty of reorderings which are OK, but it would take some effort to identify those and it seems risky.