search or combinations algorithm, packing eth68 `GetPooledTransactions` request

emhane commented 9 months ago

Describe the feature

rn TransactionFetcher, when packing a GetPooledTransactions with eth68 hashes, does this

traverse the argument list of hashes in sequential order and call include_eth68_hash for each hash.
fill request with all hashes that return true.

include_eth68_hash checks the size of the corresponding tx and that way decides if the tx will fit in the response corresponding to the request being assembled - the reason for this is too make sure we don't request too large responses (devp2p spec feature for eth68). so if we have a list of hashes mapping to list of sizes [1, 2, 3] and the size limit for the response is 4, TransactionFecther will fill the request with [sizes[0], sizes[1]] which has total size 3...but we could have better made use of the space if we would have got the list [sizes[0], sizes[2]] with total size 4.

what TransactionFetcher better do is have a new function that takes an iterator of the available hashes as parameter (buffered hashes for which the peer is fallback are available hashes in fill_request_for_peer and argument hashes list in pack_hashes_eth68) and

traverse the argument list of hashes in sequential order and call include_eth68_hash for each hash.
fill request with hashes that return true until the first hash that returns false.
from sizes of remaining hashes, search for the biggest size that fits in the remaining space in the response.
repeat 3. until the corresponding response is full

another option would be calculating sizes of combinations of remaining sizes at step 3. so that if we have remaining sizes [4, 4, 5] and the remaining space is 8, TransactionFetcher fills the request with [remaining_sizes[0], remaining_sizes[1]] instead of [remaining_sizes[2]]. I don't think it will happen often that all announced hashes don't fit into a GetPooledTransactions request, since RLPx is a gossip protocol so healthy peers will announce many of the same hashes, nonetheless, this isn't allowed to be too costly. in comparison with searching for the largest size, calculating and comparing combinations of sizes can be costly if the list of remaining sizes is long.

Additional context

No response

0xSalman commented 9 months ago

Can I take this one?

emhane commented 9 months ago

Can I take this one?

prs are reviewed on a fcfs basis :)

vcshih commented 9 months ago

I was interested in looking into this issue but after some research, it looks like it's harder to implement efficiently than I initially thought. Correct me if I'm wrong, but it turns out to be a subset sum problem and therefore will be pretty inefficient as the list of hashes grows longer.

The proposed solution doesn't seem like it would satisfy the example goal either as it would still fill the response with [sizes[0], sizes[1]]. Did I miss something in the explanation or has something similar been implemented in other ethereum clients to solve this issue?

emhane commented 9 months ago

I was interested in looking into this issue but after some research, it looks like it's harder to implement efficiently than I initially thought. Correct me if I'm wrong, but it turns out to be a subset sum problem and therefore will be pretty inefficient as the list of hashes grows longer.

nice! yeah it would be the subset sum problem in which all inputs are positive. it's costly to implement optimally yes, that's why I suggest a near-optimal solution too.

The proposed solution doesn't seem like it would satisfy the example goal either as it would still fill the response with [sizes[0], sizes[1]]. Did I miss something in the explanation or has something similar been implemented in other ethereum clients to solve this issue?

which of the proposed solutions?

allnil commented 9 months ago

+1 to the post above I'm happy that this discussion started! I've also been thinking about this problem on the weekend and tried to make some code blueprints

It's a classic Knapsack DP task (we have a budget/space and we need to spend/fill it with maximum/minimum). I remember solving some DP problems but never have done it rust, so just found this example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021 still, the main issue lies in questions: how many hashes we need to process? It could be measured and benchmarked ofc.

on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)

P.s. What I've found in go-ethereum:

https://github.com/ethereum/go-ethereum/blob/2e947b7a0041f087ce4945303f3dd267b6296a14/eth/protocols/eth/broadcast.go#L76

emhane commented 9 months ago

+1 to the post above I'm happy that this discussion started! I've also been thinking about this problem on the weekend and tried to make some code blueprints

It's a classic Knapsack DP task (we have a budget/space and we need to spend/fill it with maximum/minimum). I remember solving some DP problems but never have done it rust, so just found this example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021 still, the main issue lies in questions: how many hashes we need to process? It could be measured and benchmarked ofc.

great question, that's exactly what we should be asking ourselves. yes, finding the average by adding some metrics to the code to help us model this would be appreciated. it could be that a peer is listed as fallback for some thousands of hashes at the same time during the time window starting from when a new hash is requested for the first time, until it resolves, possibly after a retry.

primarily the problem is constrained by how much space is left in the request. any solution should start with filling the request in sequential order until some tx doesn't fit, or the size limit is reached precisely, which ever happens first. any txns with size bigger than the remaining space don't need to be considered, naturally. a great question to answer is: what is the implicit minimum size for a transaction? I say implicit since it isn't spec'ed out, though from the properties of a tx we can infer a minimum size. for example, the signature has a fixed size right? and I believe that's included in that size figure, which should refer to encoded length. not sure though at the top of my head what that size figure represent precisely. if we know a minimum size, it can be compared to the remaining space in the request, to see if it even makes sense to search for more txns. hopefully a realistic minimum size, together with the remaining size in the request can narrow down our number of input hashes to the algorithm quite a lot. that way we can consider the remaining space in multiples of the minimum size. maybe we are better of using an observed minimum size, rather than the true minimum size, that is a "the smallest txns are usually this small" value.

seems like you missed to permalink the example in playground - has happened to me too before.

on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)

P.s. What I've found in go-ethereum:

https://github.com/ethereum/go-ethereum/blob/2e947b7a0041f087ce4945303f3dd267b6296a14/eth/protocols/eth/broadcast.go#L76

yeah this is what reth does now too. the goal is to optimise it.

vcshih commented 9 months ago

on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)

It does seem like DP is the way to go for the optimal solution, but I was playing around with a simple sort by desc (mainly because I was interested in getting my first rust env going and testing it) and came up with this. I'm not sure if it is appropriate for a PR since I don't even know if this is an appropriate greedy algorithm for the knapsack problem and whether it addresses the root issue.

@emhane I modified the test to make it more comparable to the example you gave in the issue description, but I feel like there are definitely edge cases for which this algorithm doesn't really end up being optimal anyways. Putting it up there for posterity though and further discussion if warranted.

emhane commented 9 months ago

nice @vcshih. you're very welcome to open a pr, and we can take the rest from there.

emhane commented 8 months ago

closing, obsolete due to rework of integration of transaction manager components

paradigmxyz / reth

search or combinations algorithm, packing eth68 `GetPooledTransactions` request #6130

Describe the feature

Additional context