mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

[wallet] lock file contention (various api calls to and from the wallet) #206

Closed antiochp closed 6 years ago

antiochp commented 7 years ago

We have a lock file wallet.lock that prevents multiple processes from writing to the wallet and potentially corrupting it.

For a relatively simple local setup -

We are seeing contention in the wallet as the mining node hit the wallet api to build a new coinbase output while wallet info causes the wallet to refresh its outputs against the node api.

Currently the mining node will panics (and I believe stops mining until restarted) if it fails to obtain the wallet lock.

There are probably opportunities to avoid locking the wallet during expensive api calls. But I'm not sure we can completely eliminate the risk of contention using this global lock file for the wallet.

I think we need to rethink how the wallet persists state to disk - a global lock file is going to become a point of contention if it wraps both these in periods where the lock is required -

See #205 which is related (replaces multiple expensive api calls with a single expensive api call). But note in #205 we still lock the wallet for the duration of this external api call.

ignopeverell commented 7 years ago

We have a shared resource (the wallet data) that can be accessed independently by multiple processes. So no matter what, a lock for all these processes is required, regardless of the representation of that data. Now there are plenty of ways to minimize contention and failures:

  1. Avoid locking for a long time. In your case, the lock should be acquired after the HTTP request and just for the purpose of updating these entries that have a new status.
  2. Retry a few times on failure. The mining node should definitely do that.
  3. Possibly versioning, although I think that's likely overkill at this point. With versioning you can introduce optimistic locking. Maybe something we can look at in the future, once the 2 above are in place (necessary in any case).
antiochp commented 7 years ago

Yeah agreed on (1) and (2). I might take a look at making some progress on this. (3) is kind of where my head was at when I opened this issue but agreed that it is not the best place to start looking for solutions right now.

antiochp commented 7 years ago

Got a pretty decent solution for (1) - PR going up shortly. Going to focus on getting some retry logic in there for (2) also.

We now hold the lock for only short periods of time (and only when we actually update or add outputs in the wallet.

For example when refreshing wallet outputs (lock held for 15ms) -

Oct 25 11:52:12.400 DEBG Refreshing wallet outputs
Oct 25 11:52:12.678 INFO acquired wallet lock ...
Oct 25 11:52:12.693 INFO ... released wallet lock
antiochp commented 6 years ago

(1) resolved via #210. (2) proposed impl is here - #213 (in progress).

antiochp commented 6 years ago

Resolving this. Seems like it is rare to actually see contention now (at least in local testing) from #210. Once we get retry logic in from #213 then I think we're very unlikely to be affected negatively by any contention (and we'll see it sprayed all over the logs indicating what is happening). Specifically - the mining node will happily sit there retrying until the wallet successfully creates a new coinbase output.