Ilina/fastsync concurrency

istoilkovska commented 4 years ago

Modeling of bounded and blocking buffers.

ancazamfir commented 4 years ago

This is one counterexample given by TLC that I tried to illustrate here. The little square is the memory where we read a message from the inbound queue. Queue size is 1 in this case so when we read a message the queue becomes empty. Circles are routines, greyed one is the one running and when red-lined it is blocked. The D routine injects messages (requests), R responds. It's a simple example but I believe the more complex fastsync architecture modified with blocking queues should show the same problem.

Untitled Diagram

ancazamfir commented 4 years ago

An update on this, after talking to Ilina and thinking on this more, we did a variation of the model where a timer triggered event (e.g. we send a statusRequests every 10 seconds) doesn't run as a separate routine but rather in the same thread of execution as the routine handling the incoming requests (this is the case in v2 where we get a ticker signal into the deumx).

Concretely, this means that a timer based request cannot go in the outgoing buffer D->R if the demux message handling had previously blocked, even if the outgoing buffer is not full (like in the last two little drawings above). With this new model, the blocking doesn't occur in the simple case. In general, we understand a bit better how to model different concurrency architectures :)

konnov commented 4 years ago

If we don't find a way to check the spec for interesting parameters with TLC and Apalache, we can also try to encode this spec in Promela and check it with Spin. It was designed for analysis of concurrent systems where processes communicate via bounded channels or shared memory.

istoilkovska commented 4 years ago

I've done cleanup and refactoring in the Fastsync concurrency spec, and I've added a README file. In the version of the spec with bounded and blocking buffers, TLC is able to reach a state where all buffers are full and all routines are blocking at diameter 38, in 35min, for buffer length 1 and one peer.

ancazamfir commented 4 years ago

Looked through the counterexample for GoodState for FastsyncConcurrencyBoundedBlocking.tla and it shows a similar counterexample as I described in https://github.com/informalsystems/verification/pull/6#issuecomment-593154773 But as mentioned in https://github.com/informalsystems/verification/pull/6#issuecomment-593536807 this is possible only when the timer based actions happen from a different context and this not the case in the fastsync implementation. Concretely I believe the problem is with BroadcastStatusRequest (state 22 in the trace) and SendRProcessBlock (state 23 in the trace) actions that are enabled when the Demux is "blocked", i.e. it had previously read a message from an inbound queue and blocked while processing that message due to some outgoing queue being full. In this case (bounded blocking queues) the demux cannot ever run again until that particular outgoing queue is drained by the receiver, and this includes acting on any timeout events.

(*)not sure if runs will consistently show the same counterexample but check state where demuxInboundMessage is not noMsg and demux action happens

konnov commented 4 years ago

We could model integer timers with Apalache, but the depth of 38 is usually a bit too much for Apalache. It depends on your spec of course.

informalsystems / verification

Ilina/fastsync concurrency #6