[Bounty; performance] Remove fast block propagation/transaction processing from p2pool

jtoomim commented 5 years ago

Introduction

P2pool has a bunch of unnecessary code in it for handling transactions. This code is the main source of performance issues in p2pool code, and causes shares to propagate and process slowly, thereby increasing p2pool's share orphan/DOA rates and reducing mining fairness. This transaction handling code also makes p2pool unable to scale to large block sizes.

The core of the problem is that p2pool encodes transactions using a special 2- or 3-byte encoding scheme for transactions that are included in multiple shares, and this encoded list gets included in the share's metadata OP_RETURN commitment. This encoded transaction list is redundant information, as the transactions are already committed via the merkle root hash.

The encoding scheme is to use a (share_height_delta, transaction_index) pair to tell p2pool where it can find the TXID for that transaction (by lookup in a previous share). Using this encoding scheme, p2pool was able to transmit shares (which can be decoded into full blocks) faster than pre-Xthin/Compact-Blocks bitcoind was able to transmit blocks. In the p2p layer, nodes will keep a dictionary of txid:transaction mappings, and will send all of the transactions it receives to each of its peers to ensure that the recipient can decode a share into a block as soon as the share is received, without needing to request any additional transactions over the network.

Now that we have schemes like Compact Blocks and Graphene (written in C++), p2pool's python-based fast-block propagation is entirely obsolete. Having p2pool do processing on a per-transaction basis means that the system's performance is limited by the speed at which python code can process transactions relative to p2pool's 30-60 sec share interval, and that performance constraint is far tighter than Bitcoin's limit of the speed at which C++ code can process transactions relative to Bitcoin's 600 sec bock interval.

Overview of changes

The changes to be made fall into four main categories: Changes to the share data structure, changes to block reconstruction, changes to the p2p protocol layer, and (optionally) adding a mechanism for block/share reconstruction and validation that is asynchronous and outside the latency-critical path.

Share structure changes

The core of the change is that two fields in the share_info data structure need to be removed: new_transaction_hashes and transaction_hash_refs. There's a lot of code in various places that touch these variables, like in Share.generate_transaction(...), all of which needs to be removed or replaced. Most of it is in p2pool/data.py.

This is a hard forking change. Because of the extent of the changes, and because so much of the change will be code deletion instead of addition, it may be better to roll out this change in a backward-incompatible fashion, and just start a new share chain instead of using the typical hard fork upgrade mechanism. But if you think you can do it as a more typical hard fork, that's cool too.

Block reconstruction

As the list of transactions in a share will not be stored in the same way, some changes might be necessary to the getblocktemplate, stratum, and block reconstruction code to make sure that all of the information needed to reassemble blocks is still retained.

P2P changes

After the share changes have been made, it will no longer be necessary for peers to send transactions to each other. This involves the update_remote_view_of_my_mining_txs(...) function; the update_remote_view_of_my_known_txs(...) function; the have_tx, losing_tx, forget_tx, and remember_tx messages, all in p2p.py; and some code in work.py for keeping track of transactions in getblocktemplate results.

Asynchronous validation

[Not yet written]

Bounty

This description is not yet complete. I'll add more information later.

I am offering a $5000 total bounty for this project. Milestone payments are an option.

dfoderick commented 5 years ago

I don't understand the task yet, so let me ask some questions to get started. Is this the forest data structure? or something else? Currently, p2pool is getting tx from p2pool p2p network or from node getblocktemplate? Is that the source of the inefficiency? What is the proposed solution? To replace the p2pool tx propagation with a call to get compact block from the node instead? Basically, what would I need to look at to understand the problem?

jtoomim commented 5 years ago

Basically, what would I need to look at to understand the problem?

@dfoderick I've updated the original description to answer some of your more general questions.

Is this the forest data structure?

No, not the forest data structure. The forest is for storing the share chain itself. Elements in the forest are individual Share objects. The core modification is in the Share object.

Currently, p2pool is getting tx from p2pool p2p network or from node getblocktemplate?

P2pool nodes get txs from getblocktemplate, and then immediately share those transactions with each of the peers using a remember_tx message. According to the p2pool protocol, all txs must be marked as "remembered" over a TCP peer connection before a share can be transmitted which uses one of those txs as a new_transaction_hash.

Is that the source of the inefficiency?

The inefficiency comes at several levels. A big one is serializing and deserializing the list of varints for the transaction_hash_refs. The transaction_hash_refs and new_transaction_hashes elements also take up a lot of RAM (a few hundred MB on CPython, and a few GB on pypy). Searching previous shares for transactions to see if each transaction can be encoded as a transaction_hash_ref instead of a new_transaction_hash also takes quite a bit of time. Sending transactions to each peer, checking to see which peers each transaction needs to be sent to, and so forth also takes up a lot of both bandwidth and CPU time, though that's generally not in the latency-critical code path.

What is the proposed solution? To replace the p2pool tx propagation with a call to get compact block from the node instead?

Just submit the block to local bitcoind only on the node that actually mined the block, and let bitcoind take care of block propagation.

kr1z1s commented 5 years ago

The inefficiency comes at several levels. A big one is serializing and deserializing the list of varints for the transaction_hash_refs. The transaction_hash_refs and new_transaction_hashes elements also take up a lot of RAM (a few hundred MB on CPython, and a few GB on pypy). Searching previous shares for transactions to see if each transaction can be encoded as a transaction_hash_ref instead of a new_transaction_hash also takes quite a bit of time. Sending transactions to each peer, checking to see which peers each transaction needs to be sent to, and so forth also takes up a lot of both bandwidth and CPU time, though that's generally not in the latency-critical code path.

According to my estimates one of the most latency-critical code - this is receiving(parsing) transactions from GBT. When mempool large, the processor core running p2pool is loaded on 100%

jtoomim commented 5 years ago

I would like to keep this thread focused on how to implement, not why. Toward this end, I created a separate issue for discussing share/block cross-validation issues, and I deleted/moved comments in this thread that are about the validation issue to that issue. Discussion on the lack of block validation in p2pool should be done in https://github.com/jtoomim/p2pool/issues/21.

jtoomim commented 5 years ago

@kr1z1s GBT parsing is pretty fast now most of the time. I added caching of transactions in GBT a while ago, so each transaction in GBT parsing is just a lookup into a dictionary unless it's being seen for the first time. helper.py:getwork() does the GBT parsing:

2018-12-21 04:32:00.856972 5.139 ms for helper.py:getwork(). Cache: 687 hits 90 misses, 776 known_tx 1 unknown 778 cached

Parsing a GBT request with 100% new transactions takes about 300 ms per MB of block size. That's significant, but far from the biggest problem right now. Both ABC and BU have replacement RPCs under development which will eliminate this problem, but using them will require p2pool to not know all transactions, and to only use the merkle path instead of the full transaction list.

On the other hand, most of the total time for assembling a new stratum job (which is done by work.py:get_work()) comes from calling data.py:Share.generate_transaction(), and the slowest parts of that is creating the transaction_hash_refs stuff (33.4 ms) and doing a bunch of other transaction-related stuff (49.4 ms):

2018-12-21 04:31:59.334079 109.645 ms for work.py:get_work() 2018-12-21 04:31:59.444500 84.052 ms for data.py:generate_transaction(). Parts: 0.405 0.775 33.433 0.065 49.374

jtoomim commented 5 years ago

I posted a profiling logfile from running p2pool on BTC for one day here:

https://toom.im/files/pypyp2poolfeatherlog.txt

Except for the select.epoll call (which is just waiting for an event to process), all of the top 4 time hoggers are calls to pack.py. The #5 hog is p2p.py:208(update_remote_view_of_my_known_txs).

When mempool large, the processor core running p2pool is loaded on 100%

That's because p2pool needs to send each one of the transactions in the block template to its peers, and needs to search previous shares for a previous use of each of those transactions, not because parsing the transactions is particularly slow.

rldleblanc commented 5 years ago

I've run across some syntax that I'm not familiar with and google isn't being helpful. Please let me know what this is called or help me understand it a little better.

https://github.com/jtoomim/p2pool/blob/master/p2pool/node.py#L237

There are some functions that are named only _ as in def _(arg):. I understand that _ can mean the last value returned, or discard the result, but I'm not sure what it means here. These always seem to have a decorator attached to them, so I suppose it is some sort of shortcut.

Also, which branch should we be working on? I'm using 1mb_segwit on all my pools, but it doesn't seem to be merged into master.

jtoomim commented 5 years ago

It's basically just an anonymous function. A plain underscore is an acceptable variable name, but it's a variable that gets overwritten after each command. The point here is that he just wanted to run self.bitcoind_work.changed.run_andwatch(). It's a multi-line function, so forrestv couldn't just use a lambda.

I've been keeping the master branch mostly in line with the github.com/p2pool/p2pool master branch with the intent of rebasing and merging later on. The most recent code is in https://github.com/jtoomim/p2pool/tree/with_df_ab, though.

On Dec 22, 2018, at 9:16 AM, Robert LeBlanc notifications@github.com wrote:

I've run across some syntax that I'm not familiar with and google isn't being helpful. Please let me know what this is called or help me understand it a little better.

https://github.com/jtoomim/p2pool/blob/master/p2pool/node.py#L237 https://github.com/jtoomim/p2pool/blob/master/p2pool/node.py#L237 There are some functions that are named only as in def (arg):. I understand that _ can mean the last value returned, or discard the result, but I'm not sure what it means here. These always seem to have a decorator attached to them, so I suppose it is some sort of shortcut.

Also, which branch should we be working on? I'm using 1mb_segwit on all my pools, but it doesn't seem to be merged into master.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jtoomim/p2pool/issues/19#issuecomment-449585094, or mute the thread https://github.com/notifications/unsubscribe-auth/ALWP603CoppQbKXOeYfpal__Q9K6HBIYks5u7mjVgaJpZM4ZE1CA.

rldleblanc commented 5 years ago

That makes sense now.

Can a miner get lucky enough to solve the same share more than once? Or once a miner submits a share, a new set of work is sent to them (getblocktemplate, etc)? Or do they keep hashing on it hoping to get a better difficulty?

jtoomim commented 5 years ago

As soon as a p2pool share (not a stratum share) is found by a miner, p2pool will attempt to issue new work that extends the share chain and refers to that share as its previous_share_hash. Creating this new share typically takes a bit over 100 ms on BTC, so it is fairly common for the miner to find another share while p2pool is working on creating the updated job. These are known as DOA shares. They do not contribute to the revenue of the miner unless they are blocks.

When a stratum share is found by a miner that does not meet the difficulty threshold for being a p2pool share (i.e. only a pseudoshare), p2pool will update its non-consensus-level hashrate stats for that miner (e.g. for graphs), but will not issue new work to the miner.

On Dec 22, 2018, at 2:06 PM, Robert LeBlanc notifications@github.com wrote:

That makes sense now.

Can a miner get lucky enough to solve the same share more than once? Or once a miner submits a share, a new set of work is sent to them (getblocktemplate, etc)? Or do they keep hashing on it hoping to get a better difficulty?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jtoomim/p2pool/issues/19#issuecomment-449600383, or mute the thread https://github.com/notifications/unsubscribe-auth/ALWP6xCz8UhwU-JdrIWfAX8qdZfASYYRks5u7qzcgaJpZM4ZE1CA.

rldleblanc commented 5 years ago

So, if a miner finds a share (let's not talk about psuedo shares) at the very beginning of the work, there is no need to issues new work yet, right? The share mechanism I'm working on is kind of a hybrid of a chain and graph, so technically there could be multiple submissions on one stratum work and they all count (given that they are different difficulties). It should also remove any orphans and only DOA should be left (maybe that is orphans, basically shares that arrive after a block had been found). Even those shares could even have a lower weight so they still count too. What are your thoughts? It's kinda off an overhaul of the share and payout accounting.

Sent from a mobile device, please excuse any typos.

On Sat, Dec 22, 2018, 6:52 PM Jonathan Toomim <notifications@github.com wrote:

As soon as a p2pool share (not a stratum share) is found by a miner, p2pool will attempt to issue new work that extends the share chain and refers to that share as its previous_share_hash. Creating this new share typically takes a bit over 100 ms on BTC, so it is fairly common for the miner to find another share while p2pool is working on creating the updated job. These are known as DOA shares. They do not contribute to the revenue of the miner unless they are blocks.

When a stratum share is found by a miner that does not meet the difficulty threshold for being a p2pool share (i.e. only a pseudoshare), p2pool will update its non-consensus-level hashrate stats for that miner (e.g. for graphs), but will not issue new work to the miner.

On Dec 22, 2018, at 2:06 PM, Robert LeBlanc notifications@github.com wrote:

That makes sense now.

Can a miner get lucky enough to solve the same share more than once? Or once a miner submits a share, a new set of work is sent to them (getblocktemplate, etc)? Or do they keep hashing on it hoping to get a better difficulty?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jtoomim/p2pool/issues/19#issuecomment-449600383>, or mute the thread < https://github.com/notifications/unsubscribe-auth/ALWP6xCz8UhwU-JdrIWfAX8qdZfASYYRks5u7qzcgaJpZM4ZE1CA .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jtoomim/p2pool/issues/19#issuecomment-449608824, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3p1ym8hO-Hw-VrUZBx0UgYuaU1GVB9ks5u7uHDgaJpZM4ZE1CA .

jtoomim commented 5 years ago

So, if a miner finds a share (let's not talk about psuedo shares) at the very beginning of the work, there is no need to issues new work yet, right?

No, wrong. A share is like a block but for the p2pool share chain. As soon as a Bitcoin block is found, new work needs to be issued to miners that references this share in order to extend the share chain, unless ...

The share mechanism I'm working on is kind of a hybrid of a chain and graph, so technically there could be multiple submissions on one stratum work and they all count (given that they are different difficulties).

Unless you change the share chain into a share DAG. However, it's tricky to set up the rules for the DAG such that miners have an incentive to always work on the most recent BTC block, and to encourage all miners to always include the shares from other miners. I would be interested in seeing such a proposal. I have a few ideas on this myself. However, I'd rather focus on getting transactions out of share objects before working on DAGs.

It should also remove any orphans and only DOA should be left (maybe that is orphans, basically shares that arrive after a block had been found).

A DAG would get rid of DOAs too.

Even those shares could even have a lower weight so they still count too. What are your thoughts? It's kinda off an overhaul of the share and payout accounting.

It's a good idea, but now is probably not the time for it.

Sent from a mobile device, please excuse any typos.

On Sat, Dec 22, 2018, 6:52 PM Jonathan Toomim <notifications@github.com wrote:

As soon as a p2pool share (not a stratum share) is found by a miner, p2pool will attempt to issue new work that extends the share chain and refers to that share as its previous_share_hash. Creating this new share typically takes a bit over 100 ms on BTC, so it is fairly common for the miner to find another share while p2pool is working on creating the updated job. These are known as DOA shares. They do not contribute to the revenue of the miner unless they are blocks.

When a stratum share is found by a miner that does not meet the difficulty threshold for being a p2pool share (i.e. only a pseudoshare), p2pool will update its non-consensus-level hashrate stats for that miner (e.g. for graphs), but will not issue new work to the miner.

On Dec 22, 2018, at 2:06 PM, Robert LeBlanc notifications@github.com wrote:

That makes sense now.

Can a miner get lucky enough to solve the same share more than once? Or once a miner submits a share, a new set of work is sent to them (getblocktemplate, etc)? Or do they keep hashing on it hoping to get a better difficulty?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/jtoomim/p2pool/issues/19#issuecomment-449600383>, or mute the thread < https://github.com/notifications/unsubscribe-auth/ALWP6xCz8UhwU-JdrIWfAX8qdZfASYYRks5u7qzcgaJpZM4ZE1CA .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jtoomim/p2pool/issues/19#issuecomment-449608824, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3p1ym8hO-Hw-VrUZBx0UgYuaU1GVB9ks5u7uHDgaJpZM4ZE1CA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jtoomim/p2pool/issues/19#issuecomment-449609813, or mute the thread https://github.com/notifications/unsubscribe-auth/ALWP65348oZkeuPZUm6Mt__CCN-OQ621ks5u7ujzgaJpZM4ZE1CA.

rldleblanc commented 5 years ago

Unless you change the share chain into a share DAG. However, it's tricky to set up the rules for the DAG such that miners have an incentive to always work on the most recent BTC block, and to encourage all miners to always include the shares from other miners. I would be interested in seeing such a proposal. I have a few ideas on this myself. However, I'd rather focus on getting transactions out of share objects before working on DAGs.

From what I understand, the miners are not including any part of the share chain (one of the reasons it can be easy to poison the chain), it is just basically a linked list of shares that meet the share difficulty and the first one wins. The stratum server is the boss and can expire work that the miner is working on and can also basically force the miners to always work on new work.

Technically a miner could ignore the stratum messages and keep working on the same work, but nonce and extranonce would run out pretty quick, then they would have to submit the block to a local bitcoind, etc. Seems like too much work for people who don't even want to run their own bitcoind, so it seems like incentive enough. I don't see it as a big concern. Please help me see where I'm missing things.

I probably won't work on things tomorrow, so I think I'll have the new data model ready with some functions to do some bench marking on mid next week. If it doesn't have the memory or performance profile I expect, then I'll scrap it and just rip out and bandaid the current stuff. I think I can have full 32 MB BCH blocks and sub 10ms being under 200MB of RAM. Since I'd have to touch a bunch of code anyway...

jtoomim commented 5 years ago

Stratum messages are sent out by the p2pool node, which is usually owned by the miner. Each node is trusted by the miners that use it, and each node is able to fully verify the work that miners do. Nodes behaving maliciously with respect to other nodes is the main security concern. We're not worried about miners misbehaving.

For this task, we're not worrying about issue #21, which has been present in p2pool since day 1 and is not economically lucrative to exploit.

Technically a miner could ignore the stratum messages and keep working on the same work

If a miner chooses to mine shares that are stale, they don't get rewarded for them. Miners only get rewarded per share that's in the share chain.

but nonce and extranonce would run out pretty quick,

No, it wouldn't. IIRC, there are 8 bytes of extranonce2 plus 4 bytes of nonce. That's 2^96 nonces, enough for 5659154465304596 seconds of hashing on an S9 before the miner would need to increment the block's timestamp (which the miner can also do once per second). Even if there are only 4 bytes of extranonce1, an S9 can hash for 15 days before it needs to increment the timestamp.

From what I understand, the miners are not including any part of the share chain

Just the previous share's hash, which commits to the rest of that share, which includes the hash of the share before it, etc. In other words, each share depends on all shares before it.

then they would have to submit the block to a local bitcoind, etc.

Miners do not have enough information to reconstruct a block. Stratum does not send enough information for full block reconstruction. Only the p2pool node (or, with the GBT replacement proposals, only the bitcoind full node) can reconstruct the full block.

Seems like too much work for people who don't even want to run their own bitcoind, so it seems like incentive enough. I don't see it as a big concern. Please help me see where I'm missing things.

All performance incentives for mining non-orphan Bitcoin blocks need to be passed on directly to the nodes and the miners who are responsible for delays.

Let's say a miner is using a satellite internet connection and has 1 second latency. Block height 570001 was just found. The node has a slow internet connection and takes 30 seconds to download the new block and build a GBT/stratum job for it. If the miner finds a block during the first 30 seconds, that block will have a roughly 50% chance of ending up as an orphan block on the Bitcoin network. However, during that 30 seconds, the node will not mark shares submitted by the miner as DOA, since the node has not yet issued new work to the miner. If another node mines a share on block 570001 first, then any shares mined on block 570000 can be marked stale and invalid. (Note: this feature is currently disabled on 1mb_segwit, as the previous implementation would also orphan shares that were entirely valid. This needs to be fixed, but it's a relatively minor issue.) During the last 1 second, new work will be issued, so any shares from the miner will be marked DOA by the local node and will always end up being orphaned unless the node's code has been modified to submit known-stale shares.

The node and miner both need to be punished for those 30 and 1 second delays. I should not bear the price for your setup's poor performance.

However, if you have a share DAG instead of a share chain, you lose some of the timestamping attribute of shares, and it is more difficult to ensure that a share was not working on a stale block. In order for a DAG mechanism to be viable, you need to show that it will punish shares that were mined on stale blocks.

Also, you need to make sure that there's no incentive to intentionally ignore shares mined by other nodes. Selfish mining on the DAG by intentionally excluding the shares of opponents needs to be unviable.

Solving these two problems -- especially the second one -- is not easy, and this is not something that should be wrapped up into the same fork. Let's do one thing at a time, please, and stick to the task.

jtoomim commented 5 years ago

@rldleblanc via email:

We still need to keep all the transaction data so that we can build a block for submitting to Bitcoind, even if it's not being used for validation? Or can you just send a block header to submitblock with the work id and it will just assume the transactions that were sent in the getblocktemplate request? If it's the latter, then we only need to compute the Merkel and then throw away all the transactions.

Yes, for now we need to keep all the transaction data for our own block templates.

P2pool submits blocks with two methods:

Via the bitcoind p2p protocol
Via the bitcoind rpc protocol

https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/bitcoin/helper.py#L164

IIRC, the p2p protocol submission requires the serialized transactions as a raw string, and the rpc protocol requires the serialized transactions as a hex-encoded string.

Currently, it's set up so that the got_response() function is redefined for each getblocktemplate call in a scope that includes the original candidate share object and the transactions from the getblocktemplate call. This seems like a reasonable approach, so I suggest keeping it.

Something we may want to do when implementing this: instead of keeping the transaction information in the share_info['share_data'] structure (which becomes part of the p2p serialization for the share, and becomes consensus-level data), we may want to just have the list of transactions as an optional regular python attribute of the share object. This way, for our own block templates, we can create share objects that have this share.transactions list, and when we want to submit a block, we can just use that share.transactions list. For shares/blocks we get over the network, we can check to see if hasattr(share, 'transactions') before trying to submit it as a block, and if we don't have that list of transactions, we can either skip it or (in the future) request it from a peer.

In the future, we may add support for block submission interfaces that do not require all the transactions. There are three different bitcoind RPC interfaces for getting and returning work:

Bitcoin Core-derived getblocktemplate and submitblock
Bitcion Unlimited's getminingcandidate and submitminingsolution
Bitcoin ABC's improved getblocktemplate WIP

The first one provides and requires a full list of transactions. The other two methods only provide the coinbase, the header, and the merkle path (same as stratum).

Currently, we're using method 1 in all cases. At some point in the future, we may want to add support for 2 or 3 when they're available. In those cases, we won't need to keep the full list of transactions.

rldleblanc commented 5 years ago

I believe I have all the transactions removed from the share info and have it done with a soft fork so that we don't have to start a new chain. I just need to set up another peer and test that the peers are no longer passing the transaction data.

jtoomim commented 5 years ago

Seriously? That's possible? Doesn't the code check that the share_info transaction list can regenerate the header's merkle root?

jtoomim commented 5 years ago

You can set up multiple nodes on the same machine by changing the worker port and the p2p port:

pypy run_p2pool.py --net bitcoincash -w 9350 --p2pool-port 9351 -n 127.0.0.1:9348

-n will make it connect to the specified peer. -w sets the worker port and web UI port. --p2pool-port sets the p2pool-to-p2pool inter-node communication listening port.

rldleblanc commented 5 years ago

The current transactions are stored in bitcoin_data (I think) and it uses that to generate the merkel when it submits to bitcoind, so I just had to go through all the places and branch where it is expecting the transaction data in the share chain based on the share version number. I ran into a lot of hidden spots when reading the shares from disk at startup, but I have several blocks on Litecoin testnet that have transactions in them, so they are passing just fine.

https://chain.so/block/LTCTEST/ad00fef2582031e97eb5485f20a2a9b41ef7af5b09edc9631693a6a3985d2890

jtoomim commented 5 years ago

bitcoin_data is a module. It's p2pool/bitcoin/data.py, as opposed to p2pool/data.py. Forrestv used

from p2pool.bitcoin import data as bitcoin_data

In order to avoid namespace conflict. bitcoin_data.calculate_merkle_link(...) is just a function found in that module.

I'm fairly sure your soft fork attempt is going to fail because of this line: https://github.com/jtoomim/p2pool/blob/1mb_segwit/p2pool/data.py#L498

If my understanding is correct, that line will cause any nodes not running your code to reject shares from nodes that are running your code. That makes this a hard fork, not a soft fork.

so I just had to go through all the places and branch where it is expecting the transaction data in the share chain based on the share version number

Branching on share version number sounds kinda messy. We will eventually want to delete all of the code for old share version numbers once each of the major p2pools has forked.

rldleblanc commented 5 years ago

Yes, nodes not running my code will reject the share, but they would reject it anyway because it is an unsupported share version, right? Once enough nodes are upgraded, then the share version will kick in and the transactions will no longer be included, until that time, it runs as it always has.

As for as that line, I changed above in the code so that other_tx_hashes don't include transactions from the share if it is the new version, so far it's passing. I'll add some debug code to make sure that it is actually passing, I'm not hitting the assert a few line above it. That may change when I get this second node up and running (thanks for the info, I was going through the command line options when you sent the message).

I don't like all the branching either, and I'd like to remove all that code even for the older stuff too. I added one property to Node that makes it a lot easier. I hope to have the pull request for you to look at and then we can go over it and see what else is left.

rldleblanc commented 5 years ago

Yes, you are right, changing the share version number is a hard fork. I was thinking that was a soft fork. Sorry about that.

jtoomim commented 5 years ago

As for as that line, I changed above in the code so that other_tx_hashes don't include transactions from the share if it is the new version, so far it's passing

That does not quite sound correct. In the new code, you still need to check to ensure that the merkle_link plus the coinbase matches the header's merkle root hash value. We just no longer need to verify that the merkle_link matches the other_transaction_hashes.

rldleblanc commented 5 years ago

When I added the peer node it started failing. I'm working through that chunk of code now.

jtoomim commented 5 years ago

Estimated pool-wide DOA percent on BTC p2pool right now is 33%! That's about 18 PH/s that someone is not getting accurately credited for.

http://ml.toom.im:9332/static/graphs.html?Day

My guess is that someone has a large number of distinct workers on one p2pool node without knowing that p2pool has to do 100-400ms of work for each distinct worker on a node. @rldleblanc et al, if you've been wondering why I've been so adamant about stripping out the tx forwarding code, this is why. This user is probably going to have a bad experience with p2pool after they notice how much revenue they're losing, and they're going to leave p2pool and never come back.

@kr1z1s do you know who this person is? They could probably use some help tuning their setup. Specifically, reducing the number of unique workers per node will help them a lot. (It's okay to have 1,000 S9s on a single node, as long as all of the S9s are mining to the same address. But 10 S9s on different addresses will cause that node's performance to be poor.)

rldleblanc commented 5 years ago

@jtoomim I understand the problem. The issue I'm seeing is not so much in the tx forwarding as it is the building of the work for the miners since it is going through all the TXs and doing so very inefficiently. I've gone through all the code and got most of it updated for Python 3. I'm going through the unittests right now finding new issues and making sure they all pass. Then I need to run the code to find anything not covered by the unittest. Once that is done, then I'll refactor the get_work code path which should have huge improvements. I'm also implementing slots everywhere so that even if people aren't using Pypy, they will still get some of the benefits. I'm trying to get through it as fast as I can.

jtoomim commented 5 years ago

The tx forwarding is one of the reasons why p2pool goes through all the txes. The merkle link stuff is a second reason.

But yeah, I'm really looking forward to having these issues be resolved. I'd love to be able to mine a 32 MB block on BCH with p2pool, even if it's just on testnet.

rldleblanc commented 5 years ago

One of the optimizations that Pypy does is implement __slots__ for all classes. Instead of using a dict to store the class variables which is expensive in terms of CPU and memory, you define all the variables that will be used in the class in the __slots__ class variable. When all inherited classes define __slots__, then a dict is not created for class and you save memory and CPU. It also prevents fat fingering a variable and then wondering why things don't work. http://book.pythontips.com/en/latest/__slots__magic.html

jtoomim commented 5 years ago

Slots is mostly for RAM usage, not CPU. I don't think we need to worry about RAM usage any longer, especially on CPython.

rldleblanc commented 5 years ago

Slots are trivial to add, so I just added them as I went through all the lines of code.

Just as an update for the Python3 upgrade, I've got all the unittest passing in p2pool/test/{bitcoin,utils}. I did have ./run_pool.py --help running first to clean up most of the syntax issues (it's not working right now because I had to separate some things into strings and bytes() for the unittests. So, I just need to get the three unittests in p2pool/test/ passing and then find all the bugs not covered in the unittests. I think I'm on the downhill side of this upgrade (~60% complete I'd guess). I hope to have the Python3 version ready to start testing early next week.

dfoderick commented 5 years ago

Just as an update for the Python3 upgrade,

Python3 will be a great upgrade. Let me know when you have it checked into a branch. I can do some basic testing with it.

rldleblanc commented 5 years ago

So, the Upnp code is really old and crusty. Since it's not a critical component, I'm going to disable it for now so that I can get the Python3 migration done. Do you have any heartburn over that?

Once I get things all settled and the big things done, I can go back and work on it. It just uses SOAP and there are no real good SOAP libraries any more. So either you have to install dependencies which seem like overkill for a feature I'm not sure is even used that much. I could be totally wrong though.

jtoomim commented 5 years ago

I don't think many people are using UPNP. Go ahead and remove it.

rldleblanc commented 5 years ago

@jtoomim Do you want me to temporarily disable it or remove it completely from the code and have it no longer be an option. I just want to make sure I'm clear on your intent.

jtoomim commented 5 years ago

Up to you.

rldleblanc commented 5 years ago

Posted PR #25 for Python3 upgrade. Sorry it took so long. Had some personal issues come up and P2Pool ate most of the twisted errors making debugging a lot more painful. Please test the code and help me shake out any missed bugs.

jtoomim commented 5 years ago

Bounty has been paid. PR #22 has been rebased, and placed into https://github.com/jtoomim/p2pool/tree/remove_tx_forwarding. I will probably redirect master to this commit soon. Edit: it's now in master.

jtoomim / p2pool