lukechampine / walrus

A wallet server for Sia
https://lukechampine.com/docs/walrus
MIT License
12 stars 0 forks source link

Question about scalability #15

Closed jkawamoto closed 1 year ago

jkawamoto commented 4 years ago

Do you happen to know how many outputs a Walrus instance can handle? If we have, for example, 10000 outputs, could it work? Bolt DB seems it could handle such large data https://github.com/etcd-io/bbolt#project-status so I hope Walrus also could handle a lot of outputs.

lukechampine commented 4 years ago

I haven't run any serious scalability tests. In terms of "what to worry about," though, the order is:

  1. Addresses
  2. Transactions
  3. Outputs

The number of outputs will grow and shrink over time, but addresses and transactions just accumulate forever. So my goal would be to handle 10M addresses, 1M transactions (2KB each), and 100K outputs.

The most pressing problem today is not boltdb, but the API: there is no pagination, so if it'll return all of your addresses/transactions/outputs. That probably won't scale beyond 100K or so.

jkawamoto commented 4 years ago

Agreed. Addresses are more serious. We're running Walrus for a day with a single output and now it has 10512 addresses. If we use only one output, it wouldn't exceed 10M addresses in 3 years. But, if we create, for example, 1000 outputs, we might exceed 10M addresses in a day...

MeijeSibbel commented 4 years ago

Time to re-open https://gitlab.com/NebulousLabs/Sia/-/issues/1478 ?

jkawamoto commented 4 years ago

This one is for Siad's wallet. I'm not sure if Walrus uses their code, but we cannot run ./siac wallet addresses anymore. So, they also need to fix that problem.

lukechampine commented 4 years ago

We're running Walrus for a day with a single output and now it has 10512 addresses

Yikes. I suppose I could add a "less privacy" setting that reuses addresses. Or maybe, watch for which addresses appear in the chain, and reuse addresses if they haven't shown up in the blockchain after some period of time? Not sure.

But, if we create, for example, 1000 outputs, we might exceed 10M addresses in a day...

To be clear, you get lots of addresses because you create a new one every time you need one. So you can accidentally generate a lot if you're doing integration tests or something like that. Also, when you attempt to form a contract, it'll generate a new address, and that address sticks around even if you fail to form the contract. So the number of addresses isn't strongly correlated with the number of outputs you have.

jkawamoto commented 4 years ago

That's true. We fail to form/renew contracts with the non-existing output error, and it increases the number of addresses. So, hopefully, splitting outputs will reduce such unnecessary addresses.

Regarding reusing addresses, I wouldn't mind if it uses a reused address as the output of a contract forming transaction. If I'm not mistaken, both siad and us don't mix a contract forming transaction and a transfer transaction. That means everyone knows the input and output belong to the same owner. So, it might not be a big problem.

MeijeSibbel commented 4 years ago

I asked Chris for some feedback to understand this problem a bit better, his feedback;


TL;DR: Replace keys with a backed DB on disk (e.g. Redis) but instead of storing the secret, it stores the index used to derive the address from the seed.

Can we join forces and make this happen?

jkawamoto commented 4 years ago

We also need to reuse addresses to keep the number low: https://github.com/lukechampine/us/issues/84.