made two scripts to bring in many claims to regtest

lbryio / lbrycrd

The blockchain that provides the digital content namespace for the LBRY protocol

https://lbry.com

MIT License

2.57k stars 178 forks source link

made two scripts to bring in many claims to regtest #198

Closed BrannonKing closed 5 years ago

BrannonKing commented 6 years ago

Usage:

Scenario 1:

With mainnet server running: ./lbrycrd-cli getclaimsintrie > intrie.txt
with regtest server running: ../contrib/devtools/import_claims_from_claimsintrie_output.py ./lbrycrd/src/lbrycrd-cli < intrie.txt

Scenario 2:

with regtest server running: ./import_claims_from_name_per_line.py ../../src/lbrycrd-cli < /usr/share/dict/american-english

BrannonKing commented 6 years ago

The proposed approach is not a quick copy. I spent quite a bit of time tracking through the performance issues associated with this but could see no obvious wins. See:

screenshot from 2018-08-27 13-58-46

bvbfan commented 6 years ago

To me, it looks like slow down functions are (based on you sceenshot) CCryptoKeyStore::HaveKey isMine CWallet::AvailableCoins CWallet::CreateTransaction

bvbfan commented 6 years ago

The problem is that in CWallet::AvailableCoins we can have potentially O(N*M) calls to IsMine which calls to CBasicKeyStore::HaveKey where we can find a recursive mutex. That's extremely downsides look-ups furthermore recursive mutex is even slower than normal one. We have calls to HaveKey also in CWallet::GetKeyFromPool -> CWallet::ReserveKeyFromKeyPool. Since we don't own mutex so it's create -> release recursive mutex every time, one possible solution is to owns cs_KeyStore earlier before first loop in AvailableCoins, some kind of LOCK3 (which is not present). With C++11 atomic will be great improvement for variables but for containers still not. We can use boost::shared_mutex for multiple-readers / single-writer pattern.

BrannonKing commented 6 years ago

The LOCK2 is just two LOCK calls; it's okay to lock a third right after. If we call LOCK on a recursive mutex that is already owned, is that faster?

What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input?

bvbfan commented 6 years ago

The LOCK2 is just two LOCK calls;

But it shouldn't, the idea behind that is atomic lock of more than one mutexes to avoid deadlock https://en.cppreference.com/w/cpp/thread/lock

it's okay to lock a third right after.

You can try it.

If we call LOCK on a recursive mutex that is already owned, is that faster?

Sure, the slower part is acquiring

What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input?

Yes, you can minimize calls to HaveKey by making a map <key, result> in AvailableCoins before first loop, even it can be local variable not guarded by any mutex.

If you want help in implementation i can give a try to make it.

BrannonKing commented 6 years ago

I don't think this approach is going to be sufficient. It's just too slow. Last night I exported 350k claims from mainnet. I broke the file into quarters. I then ran import scripts on all four in parallel. Ten hours later We had 100k blocks and about 100k claims imported. However, the import rate had slowed to about 200/minute, which says we need another 21 hours to complete this. However, it's probably going to continue to slow as the tree gets more nodes.

BrannonKing commented 5 years ago

Not only is this approach insufficient from a performance standpoint, it's also insufficient on its data. It needs to bring in the real values from mainnet. To do that, it has to parse the metadata on mainnet and replace the claimIds with updated inserts.