paritytech / subxt

Interact with Substrate based nodes in Rust or WebAssembly
Other
408 stars 244 forks source link

Transaction stuck waiting for finalized in test environment. #1650

Open gianfra-t opened 3 months ago

gianfra-t commented 3 months ago

We encountered a strange behavior while performing some tests using tokio where sometimes (randomly) the transactions sent to a local test node are stuck endlessly waiting to finalized, even thought the local node keeps on generating blocks.

In our tests, a node is instantiated with manual seal on the same test function where we later perform some transactions and expect them to be finalized by awaiting on wait_for_finalized_success on the received TxProgress.

The problem we are encountering is that after we upgraded the node from 0.9.42 to 1.1.0 or greater and subxt was bumped from 0.26.0 to 0.33.0, we noticed that sometimes the transaction will get stuck waiting for a finalized event forever, even when the local testing chain continues to produce blocks and has indeed finalized the one in which the transaction was included.

Adding extra logs shows that the last status event received is Validated, yet InBestBlock or InFinalizedBlock never arrives.

Furthermore, this only happens in a random manner, and does not happen when interacting with a live chain. We have not been able to find a pattern about which transactions get stuck, yet it seems that it can happen for any transaction sent with this set-up.

Edit: This issue only happens on Linux but works as expected on macOS. Perhaps this hints at something.

niklasad1 commented 3 months ago

Hey,

Have looked at the logs from your local test node and are using the "same" tokio runtime for the client and node? You don't get dropping subscription on node side of things?

It may useful to use enable extra logs by RUST_LOG=txpool,rpc=trace" but to me this looks like node issue and the subscription is closed silently..

jsdw commented 3 months ago

manual seal

I thought that with manual seal, blocks aren't ever finalized, which is why waiting for finalized would always hang (the transaction would never get back a Finalized event)?

gianfra-t commented 3 months ago

It may be a good idea to start investigating the node also, you are right. Blocks are finalized and we can see this clearly since the transaction (and therefore our test) is finalized correctly "every so often". Regarding the runtime, yes it would be the same tokio runtime, I think, since the node is started on the same test function as where the transaction is sent.

I forgot to mention on the issue that this only happens on Linux, in macOS everything works as expected. I will edit the issue since it's not a small detail.

jsdw commented 3 months ago

Interesting!

I have no good ideas on the Subxt side; my geuss is that the node is not emitting the events that we need. Perhaps a lot of tests run in parallel and the node gets overwhelmed with subscriptions and drops some things sometimes, leading to spurious failures? Perhaps the linux machine is less beefy or something and thus less able to keep up?

gianfra-t commented 3 months ago

Have you ever encountered something like this, the node just not emitting the events? It is true that tests run in parallel but this also happen when running them in isolation. And regarding the linux machine, we tested on a local linux machine and on the github actions ubuntu image. Both have the same problem (although the local linux seems to fail less times)

jsdw commented 3 months ago

I don't believe we have, or at least, I can't think of such a time off the top of my head! We have only hit the issue with "manual seal" not creating finalized blocks or something like that. @niklasad1 does this issue ring any bells for you?