seraphis-migration / wallet3

Info and discussions about a hypothetical full 'wallet2' rewrite from scratch
MIT License
14 stars 0 forks source link

LMDB for managing the wallet cache #4

Open rbrunner7 opened 2 years ago

rbrunner7 commented 2 years ago

A problem that I have seen mentioned every now and then over the years with Monero wallets are performance problems with very big wallets at busy exchanges. See e.g. this discussion on IRC or this Monero issue.

Currently the wallet cache uses epee based serialization which means taking the whole file content into memory and decrypting everything when opening the wallet, and the reverse when closing the wallet, processes which presumably can become slow as such a file becomes really big.

One proposed solution is managing data like outputs with LMDB and doing away with the need of loading and saving everything at once.

Interestingly, the "wallet3" project over a the Oxen code fork uses SQLite for the wallet cache; it's not clear to me however whether the rows are indeed always read and written strictly as needed.

I decided to give it a try. I made a wallet with about 115,000 outputs. With this the wallet cache file has a size of roughly 75 MB. I tested on a modern notebook with an SSD under Debian Linux.

Results were mixed, but as far as load and save times are concerned there was no problem: Both operations took only about 3 seconds.

I don't know now whether 115,000 outputs already qualify as "big wallet at busy exchange", and maybe with 10 times as many, reaching load times of half a minute and file sizes approaching 1 GB things, things may start to turn ugly.

However I see a principal problem here: Code that works with arrays, vectors and maps in memory like wallet2 does not is quite a bit simpler, more direct and easier to read than code that writes and reads an LMDB database. I would guess that 99.99% of all existing Monero wallets stay within a size that has no substantial performance problems. By using LMDB we would therefore optimize and complicate the code for 0.01% of wallets. Important wallets of course, but still.

rbrunner7 commented 2 years ago

There are reports about smartphone wallets more or less unable to deal with Monero wallets that have many thousands of outputs. I am pretty sure that this is not an indication of a general wallet size problem that could be solved with a switch to LMDB for managing the wallet cache, but a much more narrow and specific problem. I think it has to do with the UI controls that are used to display lists of transactions or lists of outputs:

There must be smartphone UI list controls that are simply overburdened if you just load 100'000 transactions into them for display without further adequate configuration. Result would be a display that freezes or at least is so slow that it's unusable.

It seems that the Monero GUI wallet also had this problem until this PR in 2019. It comments "tested on wallet with 3700 txs" so we can assume that before already this number of transactions could lead to noticable performance problems. The keyword here is pagination as the measure that solved the issue.

A few days ago I testet the latest release of the GUI wallet app with a testnet wallet that has more than 100,000 outputs, and it worked without any noticable problems. Only the initial transfer of all that info into the app took a bit longer, maybe 20 seconds.

All in all I wouldn't take reports of problems with large wallets on smartphones as a strong indication that whe need to use LMDB for managing wallet caches.

UkoeHB commented 2 years ago

There are some spots in the Seraphis library, mainly input selection and the mock enote store, where I am nervous about efficiency for large amounts of owned enotes. Input selection is at risk when you own many very tiny amounts, because it will take many selections to find a solution. The enote store has a lot of maps for balance recovery, which may or may not cause slowdowns when querying the current balance or scanning new blocks if you need to iterate over the entire set of owned enotes.

These perf issues should definitely be tested before launch so no users get blind-sided.

j-berman commented 2 years ago

Some more thoughts to add to the conversation...

Pro: another major benefit with using LMDB is acid transactions. Writes inside transactions are all-or-nothing, which means the wallet state can't be corrupted if some function fails in the middle of executing.

Cons:

rbrunner7 commented 1 year ago

Adding to @j-berman 's comment:s

After watching @UkoeHB 's epic Seraphis library walkthrough and getting some more explanations how balance recovery works during the last meeting, I a pretty sure the "LMDB for wallet cache" approach is already more or less dead.

As I see it, that approach only really makes sense if you work on the database for your collection of owned enotes and probably a number of other collections, and basically stay there. So instead of accesses by keys as you have them with STD maps, or accesses by index as STD vectors allow them, you would have database accesses throughout. Lots and lots of accesses at that, in a lot of places.

Because that is what in my opinion is the true value of this "database-centric approach": It does not matter whether you have 10 enotes, or 100,000, or many millions. Let the database handle them, however many there are. No worries how long it would take to load them all, you never do that. No worries whether RAM would be large enough to load them all, you got it: You never do that.

Looking at the Seraphis library, and the sheer number of maps and vectors it deals with, I don't think replacing all that stuff with database accesses is feasible. And I fear that the wallet will turn out in a similar way: It would be an impossible drag to not be able to simply access a map or vector element but needing database accesses for everything.

boogerlad commented 1 year ago

@rbrunner7 what is the relevant timestamp for how balance recovery works?

rbrunner7 commented 1 year ago

@rbrunner7 what is the relevant timestamp for how balance recovery works?

Sorry, I don't understand what you mean. Which timestamp? And what's the connection of your question with the subject of the issue, using or not using LMDB?

boogerlad commented 1 year ago

timestamp of the Seraphis library walkthrough video. I'm interested in learning more about the structure to see if lmdb, sqlite, or if something else is more appropriate, but 3 hours without chapters is a bit much for me to digest.

rbrunner7 commented 1 year ago

Ah, ok, now I understand.

Balance recovery in particular was discussed during a second meeting that was'nt recorded. And when I say I got a certain impression from watching the walkthrough, it's an overall impression, not from e.g. 15 minutes somewhere within those 3 hours of video that I could indicate you to watch.

Anyway, what @UkoeHB built so far isn't a wallet, but just a library that doesn't store anything to disk at all. We speak here about using or not using LMDB in something that does not exist yet, that we only now start to build.