Closed b-yap closed 1 month ago
We need to fix the vaults from getting unknowingly stuck, potentially caused by running unhandled zombie tasks.
One idea is to include the polling of the stellar messages: https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/oracle/agent.rs#L100-L104 in the monitoring: https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/system.rs#L806-L817
This means updating the OracleAgent's messagesender is delayed; passing the OracleAgent to tasks must be mutable; hence using **Arc<RwLock<>>_** instead of Arc<> alone.
OracleAgent
Arc<RwLock<>>
Arc<>
But we cannot have these current tasks STARTING TOGETHER WITH the polling task. The Stellar-overlay has to run already, and all open requests MUST finish first. https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/system.rs#L787-L790
An idea is to introduce another variant of the ServiceTask, where it waits for something to finish before a task starts. Prechecking will be required.
ServiceTask
enum ServiceTask { ... // Runs a task after a prequisite check has passed. PrecheckRequired(Task), }
And to make sure the stellar-overlay and the client are communicating well, stellar-overlay will also send to client the:
hello
stellar-relay-lib
(¶m)
(param)
message_reader.rs
vault
err-derive
thiserror
tokio_spawn
tokio_unstable
Arc<OracleAgent>
Arc<RwLock<OracleAgent>>
message_sender
PrecheckRequired
issue::listen_for_issue_requests
issue::listen_for_issue_cancels
issue::listen_for_executed_issues
CancellationScheduler
listen_for_replace_requests
listen_for_accept_replace
active_block_listener
I will add relevant comments when necessary.
@ebma @gianfra-t CI passed. Merging this.
Summary
We need to fix the vaults from getting unknowingly stuck, potentially caused by running unhandled zombie tasks.
One idea is to include the polling of the stellar messages: https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/oracle/agent.rs#L100-L104 in the monitoring: https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/system.rs#L806-L817
This means updating the
OracleAgent
's messagesender is delayed; passing theOracleAgent
to tasks must be mutable; hence using **Arc<RwLock<>>
_** instead ofArc<>
alone.But we cannot have these current tasks STARTING TOGETHER WITH the polling task. The Stellar-overlay has to run already, and all open requests MUST finish first. https://github.com/pendulum-chain/spacewalk/blob/7c7989875e95e1cfe3b0aeba9bb6af01d9e33e58/clients/vault/src/system.rs#L787-L790
An idea is to introduce another variant of the
ServiceTask
, where it waits for something to finish before a task starts. Prechecking will be required.And to make sure the stellar-overlay and the client are communicating well, stellar-overlay will also send to client the:
hello
message - useful to signal the client to prepare itselfHow to start reviewing:
stellar-relay-lib
(¶m)
to(param)
message_reader.rs
filevault
err-derive
dependency, and just use the existingthiserror
dependency.tokio_spawn
for naming tasks. This is only viable fortokio_unstable
.OracleAgent
adjustmentArc<OracleAgent>
, it will beArc<RwLock<OracleAgent>>
message_sender
is moved a bit later in the code; hence we needOracleAgent
to be mutable.ServiceTask
; thePrecheckRequired
issue::listen_for_issue_requests
issue::listen_for_issue_cancels
issue::listen_for_executed_issues
CancellationScheduler
for IssueCancellationScheduler
for Replacelisten_for_replace_requests
listen_for_accept_replace
active_block_listener
I will add relevant comments when necessary.