input-output-hk / cardano-wallet-legacy

Official Wallet Backend & API for Cardano-SL
https://input-output-hk.github.io/cardano-wallet/
MIT License
21 stars 12 forks source link

Restoration Finish Unreachable #291

Open KtorZ opened 5 years ago

KtorZ commented 5 years ago
Release Operating System Cause
1.4.1 Windows & OSX & Linux) Unknown

Context

One of our exchange has ran into the following issue at the very end of a wallet restoration: ``` Exception during restoration of walletHdRootId Ae2tdP../GVTU when starting from BlockContext { slotId 888th slot of 99th epoch, hash 21badd8184a0b27b, prev a4e7eae10efebf6e} with target BlockContext { slotId 10489th slot of 98th epoch, hash bd5eca8ea93b4149, prev 7ca1d6ef5978cbe5}. Exception: RestorationFinishUnreachable bd5eca8ea93b4149 5669397ee61cc285 ``` This is rather... exceptional and caused the node to hang at `100%` restored without being eventually `synced`. # Steps to Reproduce

Unclear. To be determined.

Some possible tracks:

Expected behavior

A wallet can finalize its restoration successfully, or, at the very least, give some clearer information about the failure.

Actual behavior

The wallet doesn't finalize the restoration and a rather not helpful exception is raised.


Resolution Plan

PR

Number Base
#? develop

QA

rvl commented 5 years ago

I am so far unable to get the RestorationFinishUnreachable exception.

Restoring wallet while in recovery mode

I did get the wallet to fail by starting a restore from backup phrase while in recovery mode. If the chain is fully synced, the same restore does not fail.

What happens is that the wallet restore process finishes, but the node never exits recovery mode (absence of "Recovery mode exited gracefully") in log.

Furthermore, the /api/v1/node-info endpoint stops returning data.

I reproduced this with mainnet and testnet. On testnet, the wallet restore was able to finish. On mainnet, I could not finish the wallet restore because it consumed 23GB of RAM and I had to kill the process.

After triggering this error condition, I killed cardano-node and started it again. The restored wallet was present, but the chain sync% went back to where it was before recovery mode was entered last time.

Upgrading from 1.3.2 -> 2.0.1 and wallet restoration

I synced the testnet chain and created a wallet using the 1.3.2 release. Then I stopped the node.

Then ~80 slots later (~25 mins), I started the 2.0.1 wallet on the same state directory. For good measure, I also tried beginning a wallet restore before it was able to exit recovery mode.

In this case, I could not reproduce any bugs. Both wallets restored successfully (the first restore due to upgrade), and the node exited recovery mode.

Conflicts between epoch consolidation and wallet restoration

An exchange were getting exceptions because the epoch consolidation and wallet restoration processes were trying to flock the same files in the blocks db. I think this could be a possible cause of RestorationFinishUnreachable, but haven't yet been able to try testing this.

KtorZ commented 5 years ago

One thing I'd like to clarify @rvl; were you able to reproduce the restoration finish unreachable with the first case? or is it another case of failure :thinking: ?

rvl commented 5 years ago

One thing I'd like to clarify @rvl; were you able to reproduce the restoration finish unreachable with the first case? or is it another case of failure ?

In the first case, it was another case of failure, not RestorationFinishUnreachable.

It occurred to me later that this RestorationFinishUnreachable exception is unlikely to be related to upgrading from 1.3.2 -> 2.0.1 (second case) because I asked our user to update by deleting the state directory and restoring the wallet with 2.0.1, rather than migrating the DBs.