Add more aggressive caching of previous txouts in CTxIn.

domob1812 commented 10 years ago

Cache previous txouts matching a CTxIn when they have been loaded from disk. Also cache their height when it has been determined for CreateNewBlock. This should speed CreateNewBlock (and thus getwork/getauxblock) up considerably for large numbers of inputs processed in a block. Please review the code carefully to ensure I've not introduced any bugs, and test that all mining related things still work fine. I've only done some casual testing that getwork and getauxblock still run fine, but can't really judge whether everything is fully correct.

ryancdotorg commented 10 years ago

Can we get...

A clearmempoolcache rpc call
An option to disable the cache?

I don't feel familiar enough with the code to sign off on this as correct - would you be willing to add additional comments explaining the changes?

domob1812 commented 10 years ago

The change is that CTxIn now keeps (potentially) a pointer to its previous transaction (which uses up additional memory, hence your wishes). This pointer is filled in when the prev tx is loaded during ConnectInputs (which happens when the tx is added to the mempool), and then used in the future to prevent redundant disk accesses (when calculating the priority and when calling ConnectInputs again in CreateNewBlock). I've also restructured ConnectInputs a bit in a way which seemed clearer to me given the new caching, but it shouldn't have changed its functionality beside that (but please review that if possible).

The risks are these: 1) Possible bugs introduced by the code rewriting. 2) High memory usage due to keeping the previous transactions. I didn't notice any problems with that during my own testing, but it wasn't very exhaustive. So more results are useful. 3) It shouldn't be possible for the CTxIn to change (invalidating the cache) without clearing it - but of course I could have overlooked something.

Your suggestions for disabling/clearing the cache sound good (regarding item 2 above). I'll add them.

phelixbtc commented 10 years ago

There is only little network load but currently the patched version is about six times faster. I expect this to increase with higher load.

The worst case is many inputs each coming from a large transaction themselves, correct? This reminds me, we should consider limiting TX size: https://forum.namecoin.info/viewtopic.php?f=5&t=1952

ryancdotorg commented 10 years ago

Will reread the code in a few hours

On August 24, 2014 7:31:58 AM EDT, phelixbtc notifications@github.com wrote:

There is only little network load but currently the patched version is about six times faster. I expect this to increase with higher load.

The worst case is many inputs each coming from a large transaction themselves, correct? This reminds me, we should consider limiting TX size: https://forum.namecoin.info/viewtopic.php?f=5&t=1952

Reply to this email directly or view it on GitHub: https://github.com/namecoin/namecoin/pull/158#issuecomment-53189143

Sent from my Android device with K-9 Mail. Please excuse my brevity.

phelixbtc commented 10 years ago

Just looked at the RAM usage and Ryan was correct with his concerns: Both with started with default parameters the unpatched version uses 95MB and the patched one 295MB... (There are currently 750 TXs in my mempool with a size of 315kb).

I assume it would be rather complicated to only cache inputs instead of whole txPrev?

Hmm...

ryancdotorg commented 10 years ago

Can we maybe improve this with a better data structure?

On August 24, 2014 8:47:41 AM EDT, phelixbtc notifications@github.com wrote:

Just looked at the RAM usage and Ryan was correct with his concerns: Both with started with default parameters the unpatched version uses 95MB and the patched one 295MB... (There are currently 750 TXs in my mempool with a size of 315kb).

I assume it would be rather complicated to only cache inputs instead of whole transactions?

Hmm...

Reply to this email directly or view it on GitHub: https://github.com/namecoin/namecoin/pull/158#issuecomment-53191526

Sent from my Android device with K-9 Mail. Please excuse my brevity.

domob1812 commented 10 years ago

If an option is added to disable / clear the caching, is 300 MiB really such a big deal nowadays? But yes, it is probably possible to optimise RAM usage. ConnectInputs and priority calculation itself only need the input's value and whether or not is a coinbase transaction - the place where the full prev tx is needed is the ConnectInputs hook. I didn't want to rewrite that, too, in order to make it depend only on the actually necessary information. I could do this, though, but it means a lot more changes to the code.

domob1812 commented 10 years ago

I suggest to add the disable / clear feature (as promised above) and then get this merged. Who doesn't want the caching can disable it, or regularly clear. I can work on optimisations later on in a seperate patch.

ryancdotorg commented 10 years ago

People want to run Namecoin on raspberry pi type devices and VPSs with 512-1024 MB of RAM

On August 24, 2014 9:06:08 AM EDT, Daniel Kraft notifications@github.com wrote:

If an option is added to disable / clear the caching, is 300 MiB really such a big deal nowadays? But yes, it is probably possible to optimise RAM usage. ConnectInputs and priority calculation itself only need the input's value and whether or not is a coinbase transaction - the place where the full prev tx is needed is the ConnectInputs hook. I didn't want to rewrite that, too, in order to make it depend only on the actually necessary information. I could do this, though, but it means a lot more changes to the code.

Reply to this email directly or view it on GitHub: https://github.com/namecoin/namecoin/pull/158#issuecomment-53192143

Sent from my Android device with K-9 Mail. Please excuse my brevity.

ryancdotorg commented 10 years ago

Sounds good.

On August 24, 2014 9:08:18 AM EDT, Daniel Kraft notifications@github.com wrote:

I suggest to add the disable / clear feature (as promised above) and then get this merged. Who doesn't want the caching can disable it, or regularly clear. I can work on optimisations later on in a seperate patch.

Reply to this email directly or view it on GitHub: https://github.com/namecoin/namecoin/pull/158#issuecomment-53192215

Sent from my Android device with K-9 Mail. Please excuse my brevity.

domob1812 commented 10 years ago

The cache can now be enabled/disabled with the "-notxprevcache" option (it is on by default) and by using RPC "settxprevcache true" / "settxprevcache false". Disabling by RPC also clears it. (It makes no sense to have a "cleartxprevcache" call, since it will be filled up again very soon if caching is not disabled at the same time.) This can be used to reduce RAM usage on the running process when it gets close to the limit, for instance. Please test, I've only done some limited testing on test-net so far.

phelixbtc commented 10 years ago

Will test.

300MB is not an issue especially for miners but I am worried it would be relatively easy to flood this cache. Only caching value and priority of a TX would be a completely different approach?

Or maybe two staged: cache value and priority for all mempool TXs, cache prevTX for block candidate TXs.

ryancdotorg commented 10 years ago

Should be able to flood it with crap on testnet. I think the issue we had was related to a massive number of inputs so may take a few rounds of transactions to replicate. If someone wants to send me a bunch of testnet coins I can script up a stress test.

On August 24, 2014 11:27:53 AM EDT, phelixbtc notifications@github.com wrote:

Will test.

300MB is not an issue especially for miners but I am worried it would be relatively easy to flood this cache. Only caching value and priority of a TX would be a completely different approach?

Or maybe two staged: cache value and priority for all mempool TXs, cache prevTX for block candidate TXs.

Reply to this email directly or view it on GitHub: https://github.com/namecoin/namecoin/pull/158#issuecomment-53196962

Sent from my Android device with K-9 Mail. Please excuse my brevity.

domob1812 commented 10 years ago

No, only caching value and priority would basically be the same approach - but implemented differently. I actually even had it done partially, but then found out that the prevtx is also needed for the ConnectInputs hook. So we would need to cache also what is needed there, which is (presumably) at least the name for name operations. I didn't want to touch also the ConnectInputs hook, so I decided to cache the full prev tx for now.

ghost commented 10 years ago

Let me know if you need a lot of testnet coins, I have some spare, just give me an address to send to. I'm running a couple of nodes if you want me to help test something.

phelixbtc commented 10 years ago

@ryancdotorg & @John-Kenney sounds good. @domob1812 the big difference being that ConnectInputs is not run for every TX in the mempool for CreateNewBlock but only for TXs that actually go into the block. With caching only the block candidate prevTX the cache should be limited to 25-40MB. For the mempool TX it should be sufficient to only store priority (or two values to calculate priority from the current height if my math did not let me down).

domob1812 commented 10 years ago

Ah yes, I see what you mean. That's right, in order to optimise "just" the priority calculation, we could store the prev tx' value and height. Then we would have to load it again in ConnectInputs - I wanted to go the full way and avoid also the redundant disk load in the following ConnectInputs call, though.

ryancdotorg commented 10 years ago

@John-Kenney can you make a new address on testnet, email me the private key and send some coins to it? I'm still away from home and don't have access to everything - and my testnet node is currently not running.

ghost commented 10 years ago

emailed to the address on your github profile

ghost commented 10 years ago

Please send any testnet coins you're not using back to the faucet when you're done, I still have 160k left, so feel free to use them all up, I can send more & I've upped payouts if you need some more when I'm not around https://nmctest.net/

phelixbtc commented 10 years ago

For the record: when the aggregator strikes again or somebody else deliberately tries to hurt the network RAM use could go up to a Gigabyte or so with this patch. I assume malled TXs can only go into the mempool once because the inputs are checked?

domob1812 commented 10 years ago

Yes, AcceptToMemoryPool should reject double-spends of transactions that are already in the mempool. This should prevent malled transactions to be accepted (since they double-spend the same outputs).

When this patch is merged, I can also implement the priority-only caching - it could be done also if the txprevcache is disabled via the flag or RPC call (since it increases memory usage only slightly).

phelixbtc commented 10 years ago

OK. But I suggest the two tiered cache system to be default when it is implemented. Note that it is only very slightly slower than this version as most of the block candidate TXs stay the same and only the changed ones have to be loaded. (I would like to finish reviewing the code tonight, then I will ACK)

indolering commented 10 years ago

I can ask BitMinter and CloudHasing to test. A static binary (so they don't have to fuck around with compiling) would be ideal.

phelixbtc commented 10 years ago

ACK expecting we will improve the memory situation.

phelixbtc commented 10 years ago

The Aggregator is aggregating again. My client with this patch and default options just crashed at the 2GB limit. Thanks for the demonstration! :)

ryancdotorg commented 10 years ago

@phelixbtc do you have the fee patch installed?

phelixbtc commented 10 years ago

@ryancdotorg No, I was running Domob's tree. Unfortunately I could not get a profile and need to wait for the next rebroadcast.

phelixbtc commented 10 years ago

I wonder whether the caching works OK because with two GB it could have cached most of the blockchain already. Or is the data structure so much less efficient?

ryancdotorg commented 10 years ago

Memory leak maybe? Do you know how to use valgrind?

domob1812 commented 10 years ago

It is inefficient. The same tx could be hold multiple times in memory if it is the prev tx for multiple mempool inputs. This could be optimised away using, for instance, refcounting, but I don't think it is worth the effort. I'm almost done with the improved patch that doesn't need the prev tx cache but still improves priority calculation performance.

What I'm more concerned about, though, is that even "settxprevcache false" did not make my own node use less memory (as shown in top). Not sure if this was just due to the way memory usage is displayed, but if not, it may indeed be a memory leak. If you can get valgrind data, that would be great.

phelixbtc commented 10 years ago

@ryancdotorg I'm on Windows :) I should have tested "settxprevach false" before, unfortunately I did not think of it.

namecoin / namecoin-legacy

Add more aggressive caching of previous txouts in CTxIn. #158