Closed emhane closed 8 months ago
Can I take this one?
Can I take this one?
prs are reviewed on a fcfs basis :)
I was interested in looking into this issue but after some research, it looks like it's harder to implement efficiently than I initially thought. Correct me if I'm wrong, but it turns out to be a subset sum problem and therefore will be pretty inefficient as the list of hashes grows longer.
The proposed solution doesn't seem like it would satisfy the example goal either as it would still fill the response with [sizes[0], sizes[1]]
. Did I miss something in the explanation or has something similar been implemented in other ethereum clients to solve this issue?
I was interested in looking into this issue but after some research, it looks like it's harder to implement efficiently than I initially thought. Correct me if I'm wrong, but it turns out to be a subset sum problem and therefore will be pretty inefficient as the list of hashes grows longer.
nice! yeah it would be the subset sum problem in which all inputs are positive. it's costly to implement optimally yes, that's why I suggest a near-optimal solution too.
The proposed solution doesn't seem like it would satisfy the example goal either as it would still fill the response with
[sizes[0], sizes[1]]
. Did I miss something in the explanation or has something similar been implemented in other ethereum clients to solve this issue?
which of the proposed solutions?
+1 to the post above I'm happy that this discussion started! I've also been thinking about this problem on the weekend and tried to make some code blueprints
It's a classic Knapsack DP task (we have a budget/space and we need to spend/fill it with maximum/minimum). I remember solving some DP problems but never have done it rust, so just found this example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021 still, the main issue lies in questions: how many hashes we need to process? It could be measured and benchmarked ofc.
on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)
P.s. What I've found in go-ethereum:
+1 to the post above I'm happy that this discussion started! I've also been thinking about this problem on the weekend and tried to make some code blueprints
It's a classic Knapsack DP task (we have a budget/space and we need to spend/fill it with maximum/minimum). I remember solving some DP problems but never have done it rust, so just found this example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021 still, the main issue lies in questions: how many hashes we need to process? It could be measured and benchmarked ofc.
great question, that's exactly what we should be asking ourselves. yes, finding the average by adding some metrics to the code to help us model this would be appreciated. it could be that a peer is listed as fallback for some thousands of hashes at the same time during the time window starting from when a new hash is requested for the first time, until it resolves, possibly after a retry.
primarily the problem is constrained by how much space is left in the request. any solution should start with filling the request in sequential order until some tx doesn't fit, or the size limit is reached precisely, which ever happens first. any txns with size bigger than the remaining space don't need to be considered, naturally. a great question to answer is: what is the implicit minimum size for a transaction? I say implicit since it isn't spec'ed out, though from the properties of a tx we can infer a minimum size. for example, the signature has a fixed size right? and I believe that's included in that size figure, which should refer to encoded length. not sure though at the top of my head what that size figure represent precisely. if we know a minimum size, it can be compared to the remaining space in the request, to see if it even makes sense to search for more txns. hopefully a realistic minimum size, together with the remaining size in the request can narrow down our number of input hashes to the algorithm quite a lot. that way we can consider the remaining space in multiples of the minimum size. maybe we are better of using an observed minimum size, rather than the true minimum size, that is a "the smallest txns are usually this small" value.
seems like you missed to permalink the example in playground - has happened to me too before.
on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)
P.s. What I've found in go-ethereum:
yeah this is what reth does now too. the goal is to optimise it.
on the other hand, we may just do something greedy like sort DESC, grab what we can in one pass, and call it a deal :)
It does seem like DP is the way to go for the optimal solution, but I was playing around with a simple sort by desc (mainly because I was interested in getting my first rust env going and testing it) and came up with this. I'm not sure if it is appropriate for a PR since I don't even know if this is an appropriate greedy algorithm for the knapsack problem and whether it addresses the root issue.
@emhane I modified the test to make it more comparable to the example you gave in the issue description, but I feel like there are definitely edge cases for which this algorithm doesn't really end up being optimal anyways. Putting it up there for posterity though and further discussion if warranted.
nice @vcshih. you're very welcome to open a pr, and we can take the rest from there.
closing, obsolete due to rework of integration of transaction manager components
Describe the feature
rn
TransactionFetcher
, when packing aGetPooledTransactions
with eth68 hashes, does thisinclude_eth68_hash
for each hash.include_eth68_hash
checks the size of the corresponding tx and that way decides if the tx will fit in the response corresponding to the request being assembled - the reason for this is too make sure we don't request too large responses (devp2p spec feature for eth68). so if we have a list of hashes mapping to list of sizes [1, 2, 3] and the size limit for the response is 4,TransactionFecther
will fill the request with [sizes[0], sizes[1]] which has total size 3...but we could have better made use of the space if we would have got the list [sizes[0], sizes[2]] with total size 4.what
TransactionFetcher
better do is have a new function that takes an iterator of the available hashes as parameter (buffered hashes for which the peer is fallback are available hashes infill_request_for_peer
and argument hashes list inpack_hashes_eth68
) andinclude_eth68_hash
for each hash.another option would be calculating sizes of combinations of remaining sizes at step 3. so that if we have remaining sizes [4, 4, 5] and the remaining space is 8,
TransactionFetcher
fills the request with [remaining_sizes[0], remaining_sizes[1]] instead of [remaining_sizes[2]]. I don't think it will happen often that all announced hashes don't fit into aGetPooledTransactions
request, since RLPx is a gossip protocol so healthy peers will announce many of the same hashes, nonetheless, this isn't allowed to be too costly. in comparison with searching for the largest size, calculating and comparing combinations of sizes can be costly if the list of remaining sizes is long.Additional context
No response