One more concurrency discussion

matklad commented 6 years ago

I am looking at this function and I am having a hard time convincing myself it has any particular strong semantics:

https://github.com/rust-lang-nursery/rls/blob/fbec2de58e727b3b614001650d8d80a8d2b2f85b/src/build/mod.rs#L305-L316

My understanding is that block_on_build should guarantee that we have an up-to-date analysis so that write operations like rename work correctly. I think that block_on_build fails to achieve this goal:

The loop spins on building atomic, which is reset in request_build. However, the actual analysis results are prepared later, in analysis queue. So there's a window of opportunity, when the build per se is finished, but analysis results are stale
Looks like analysis job might be cancelled which would unpark the waiting threads and cause them to read stale data.

Am I correct that we might after all, in theory, get a not so fresh analysis after block_on_build?

More generally though, I feel the core of the problem here is that AnalysisHost is a mutable object which holds a current version of analysis, where current is not well-defined. In particular, the data stored in analysis can change in the middle of some request, and you can observe a violation of repeatable read.

I wonder if we can improve the situation in a principled way? One approach which might work is to split all RLS data into mutable State part and immutable Snapshot part. One part of the state is the current snapshot. All RLS operations then would at the begning read the current snapshot from the state, process it (potentially in parallel) and then respond with a result, unless the current snaphot have changed (i.e, we store an autoincrement version in the snapshot). RLS notifications will schedule the work for updating the mutable state in a serialized manner. That is, I want to guarantee that operations internally see a consistent snapshot of the world, and that all advances of the current state are centralized and serialized. In this world, the api for block_for_build might look like this:

/// Returns the results of the latest successful build. 
/// This never blocks, but the analysis might not correspond to the latest state of VFS,
/// or it might be absent altogether
fn latest_available_build() -> Option<Arc<AnalysisSnapshot>> { 
    state.lock().current_snapshot().clone()
}

/// Blocks the thread until all pending builds are finished and returns a fresh snaphsot
/// Note that snapshot might become obsolete by the time the response is ready,
/// so verify that snapshot's versions matches the current version on the dispatch thread before sending /// a response
fn block_until_fresh_build() -> Arc<AnalysisSnapshot> { 
   while state.build_in_progress.load(SeqCst) {
       state.waiters.enqueue(this_thread);
       thread::park();
   }
   state.lock().current_snapshot().clone()
}

Does this sounds reasonable? These all is pretty hand-wavy as written but if we want to pursue a Clojuresque "mutable state singleton + readonly snapshots + versions to check for stale snapshots" approach I can try to sketch some code :)

matklad commented 6 years ago

It might also be interesting to compare with IntelliJ approach.

Unlike RLS, IntelliJ owns the text files, so it's internal data structures are mutable. However, there's only a single RW lock for all IntelliJ data, all request processing happens within a read lock (which guarantees you a consistent snapshot) unless it is a write request, in which case a write lock is held in the same way for the duration of the whole request (additionally, only a dedicated thread can hold a write lock).

matklad commented 6 years ago

FWIW, I am experimenting with "mutable state + readonly snapshots" approach in libsyntax2, and everything looks good so far. All the state data is mutably owned by the main loop, handlers receive consistent read only snapshots and are dispatched to the threadpool, and that all works without a single lock.

EDIT: to clarify, libsyntax2 has grown its own language server.

kngwyu commented 6 years ago

State & Snapshot approach sounds reasonable for me, and I'm really interested in improving concurrency. But, if we changed our concurrent architecture, how could we test and check that it's surely improved? @matklad Any idea?

matklad commented 6 years ago

I think the only way to test architecture is by trial and error :) In this particular case, the specific improvement I am looking for is the repeatable read invariant, and it's pretty clear that current approach with hidden mutexes does not yield one naturally.

There's a question of "do we really need strong semantics here in the first place?", and I'd like to argue that it indeed would be beneficial for long-term stability and maintainability.

rust-lang / rls

One more concurrency discussion #962