This is not a problem with tower-abci per se, but I thought it would be a good place to at least ask the question and raise awareness.
We had an issue where it appeared that in our service deliver_tx and end_block was called concurrently, that is, end_block was called before the last deliver_tx was finished, and later a consensus failure because different nodes applied transactions in different order, which was another sign that multiple deliver_tx calls were happening simultaneously.
The reason I think is that our implementation of the Service, like the kvstoreexample, always returned Ready from poll_ready, and this is the only thing the Workerawaits on. If it's always ready, it calls the service, which creates another Future (but doesn't wait on it), and those can thus execute concurrently and out of order.
I'm hoping to solve this by surrounding the consensus service after the split like here with a ServiceBuilder::new().buffer(1000).concurrency_limit(1).service(consensus), which hopefully will not report itself as ready until the previous call is finished.
I can see that penumbra doesn't use this split mechanism, so it can make more intelligent decisions about when it's ready to receive more requests, but I'm wondering if the examples could draw attention to this potential issue. It's possible that the documentation mentions it and I just missed its significance.
This is not a problem with
tower-abci
per se, but I thought it would be a good place to at least ask the question and raise awareness.We had an issue where it appeared that in our service
deliver_tx
andend_block
was called concurrently, that is,end_block
was called before the lastdeliver_tx
was finished, and later a consensus failure because different nodes applied transactions in different order, which was another sign that multipledeliver_tx
calls were happening simultaneously.The reason I think is that our implementation of the
Service
, like thekvstore
example, always returnedReady
frompoll_ready
, and this is the only thing theWorker
awaits on. If it's always ready, it calls the service, which creates anotherFuture
(but doesn't wait on it), and those can thus execute concurrently and out of order.I'm hoping to solve this by surrounding the
consensus
service after the split like here with aServiceBuilder::new().buffer(1000).concurrency_limit(1).service(consensus)
, which hopefully will not report itself as ready until the previous call is finished.I can see that penumbra doesn't use this split mechanism, so it can make more intelligent decisions about when it's ready to receive more requests, but I'm wondering if the examples could draw attention to this potential issue. It's possible that the documentation mentions it and I just missed its significance.