Closed vogel closed 5 years ago
Why do we resolve pending merges in the publish queue in the first place?
Let me look it up in commit history.
100% lack of information there - only one commit adding whole HistoryWork stuff.
I think that the only reason this is done it to reduce time it requires nodes to catch up. Comment on BucketList.h
mentions that, for example, for level 7 merge could take up to 45 minutes. For level 8 it would be 180 minutes, for level 9 720 minutes and for level 10 2880 minutes (2 days).
Doing that merge during catchup is suboptimal.
But also doing that merge during publish means that we do not publish for a long time.
I see 2 part solution here:
Well, I think I misunderstood some of that. Now I don't think that we need to finish the merge of FutureBucket immediately when we are doing catchup. Probably the assumption is that when the bucket is not yet merged in history archive, it is not close to be promoted to curr
on that level, so it can take some time.
And when it is close to be promoted, it should already be merged (as we have about 15x required safety).
right, from what you are saying it seems that it's basically safer for the network to always publish (without waiting for the merge to be done), so that other nodes can catchup to the network. Right now if we have a slow merge it potentially opens the possibility of having arbitrary long gaps between checkpoints, which doesn't sound like a good property.
During that time the bucket will show up with inputs but that seems fine. The optimization to fixup the pending merges with the completed merge seems like something we could do later on.
@marta-lokhova is working on this
We do not cache results of bucket merges. It can cause to suboptimal usage of resources. Imagine not-yet merged
FutureBucket
that goes intopublishqueue
table. If it is at high enough level, it will be created multiple times as result of merge of the same two buckets with the same shadow set - it will lead to the same output bucket each time, which obviously could be cached.We should add that cache to reduce IO and CPU usage.