penumbra-zone / tower-abci

Tower-based ABCI interface.
MIT License
75 stars 25 forks source link

Out-of-order transaction execution #44

Closed aakoshh closed 6 months ago

aakoshh commented 6 months ago

This is not a problem with tower-abci per se, but I thought it would be a good place to at least ask the question and raise awareness.

We had an issue where it appeared that in our service deliver_tx and end_block was called concurrently, that is, end_block was called before the last deliver_tx was finished, and later a consensus failure because different nodes applied transactions in different order, which was another sign that multiple deliver_tx calls were happening simultaneously.

The reason I think is that our implementation of the Service, like the kvstore example, always returned Ready from poll_ready, and this is the only thing the Worker awaits on. If it's always ready, it calls the service, which creates another Future (but doesn't wait on it), and those can thus execute concurrently and out of order.

I'm hoping to solve this by surrounding the consensus service after the split like here with a ServiceBuilder::new().buffer(1000).concurrency_limit(1).service(consensus), which hopefully will not report itself as ready until the previous call is finished.

I can see that penumbra doesn't use this split mechanism, so it can make more intelligent decisions about when it's ready to receive more requests, but I'm wondering if the examples could draw attention to this potential issue. It's possible that the documentation mentions it and I just missed its significance.

hdevalence commented 6 months ago

Ouch, that is an unfortunate footgun indeed. If you want to send a docs PR that would be helpful!