Open lutter opened 1 year ago
The most obvious implementation goes like this: currently, changing an entity from an old version to a new version happens by clamping the block_range
of the old version and inserting the new version. Instead of that, we'd just do an update
of the old version to turn it into the new version.
This has a couple drawbacks:
update
needs to be issued as a separate statement for each entity (no bulk operations, unless we do funky stuff with update .. from
)Another way to implement this would therefore be to do a delete
of the old versions and an insert
of the new versions, which should have almost the same effect on the database. There would be some differences if the update was heap-only, but because we have indexes on everything, we don't ever do HOT updates. Those two operations could happen on many entities at once.
But what happens there is very similar to pruning: pruning does the delete
and overwriting when we transact does a clamp + insert.
A fairly simple way to implement this might therefore be changing the history_blocks
when syncing: basically, if we are more than history_blocks
from the chain head, don't prune according to history_blocks
, instead, use a pretty small fixed value, say 1000 or 100; we'd want to strike a balance between keeping tables small and not running pruning too often.
The real win here will come from avoiding writing old versions in the first place, but that requires that we have batch write logic in place where we can remove old versions in memory as we assemble the batch.
thanks @lutter - to be clear, there isn't going to be that much of an uplift from this change in the immediate term? Is it worth tracking this as a follow-on to #4538, so that as well as batch writing we only write latest versions during syncing? Or is that part of #4538 already?
thanks @lutter - to be clear, there isn't going to be that much of an uplift from this change in the immediate term?
I am not sure how much of an uplift it would be - but I think drastically limiting history during syncing would be a good first step in gauging that. We should compare a subgraph that keeps, say, 300k block of history with one that limits it to 100 blocks in the way I described above to see what performance impact that has. That change should be relatively easy to implement (a few days)
Is it worth tracking this as a follow-on to #4538, so that as well as batch writing we only write latest versions during syncing? Or is that part of #4538 already?
No, #4538 does not do anything about limiting history. In my very limited playing around with it, it's not clear that batches would become big enough that pruning them in memory would have much of an effect; so far, the batches I was able to produce were fairly small, less than 100 blocks. The batch size roughly depends on how slow the database is compared to processing - the slower the database, the bigger the batch. More aggressive pruning during syncing would be orthogonal to that, and I think would also be worthwhile. Batching should become pruning-aware, but we should first see how effective it would be - the actual in-memory pruning would be pretty easy to implement.
@mangas I think you investigated this and there wouldn't be a significant benefit to this, vs. simply relying on pruning?
Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.
Description
When syncing a subgraph that has limited history, i.e. has
history_blocks
set, we should not keep any history while we are syncing more thanhistory_blocks
away from the chain head. When storing an update to an entity, simply overwrite the old version of the entity instead of keeping the old and the new version.Are you aware of any blockers that must be resolved before implementing this feature? If so, which? Link to any relevant GitHub issues.
No response
Some information to help us out