Open buffalu opened 2 years ago
I think the behavior that you're alluding to can be considered for snapshots in general (both full and incremental). For example, if your two full snapshots were at slots 146958936
and 146984605
, then the later full snapshot would be picked and you'd encounter same issue of starting_slot
being later than halt_at_slot
I think there's a bug in the loading from snapshot where it attempts to load incremental snapshots past the starting slot.
As you've found, current behavior just greedily picks the latest snapshots: https://github.com/solana-labs/solana/blob/12693e83df4aa2c08ad31c4f1d8036d60ca858b4/ledger/src/bank_forks_utils.rs#L197 That call pick the latest full snapshot and the latest corresponding incremental snapshot (if there are any).
Hypothetically, we could sweep through all available snapshots to find one that is less than halt_at_slot
. However, that would be a deviation from what validator does which I'm hesitant about since solana-ledger-tool
runs are typically used to mimic validator behavior. Maybe we could throw out a warn!
or error!
if snapshot creation slot is less than the snapshot load slot
this process is going to be very frequent for what we're building, so a little frustrating that it needs this much hand holding. but it seems like we're just working outside the bounds of what it's currently capable of, so it seems like it'd be best to just write our own version that works as expected :)
this process is going to be very frequent for what we're building, so a little frustrating that it needs this much hand holding. but it seems like we're just working outside the bounds of what it's currently capable of, so it seems like it'd be best to just write our own version that works as expected :)
Let me clarify, I'm hesitant to change default behavior away from what validator does. I could see adding a flag that a user opts into.
Some Discord discussion on the matter: https://discord.com/channels/428295358100013066/439194979856809985/1014284768281645106
I would recommend updating ledger-tool to use snapshot_utils::get_full_snapshot_archives()/get_incremental_snapshot_archives(), and then passing the correct pair (based on desired slot) to snapshot_utils::bank_from_snapshot_archives().
This way bank_from_latest_snapshot_archives() retains its intended purpose, and bank_from_snapshot_archives() is used when specific/non-latest snapshot archives are desired.
This sounds fine to me; however, a little more plumbing will be necessary as ledger-tool doesn't call bank_from_*()
directly; rather, ledger-tool calls bank_forks_utils::load_bank_forks()
which has the call to bank_from_latest_snapshot_archives()
under the hood. bank_forks_utils::load_bank_forks()
is also called by validator::load_blockstore()
so we will either another function or a flag for alternate snapshot selection strategy
I think it would make sense for ledger-tool
to have its own pub fn on bank_fork_utils
that is calls (i.e. bank_fork_utils::load_bank_forks_for_ledger_tool()
). It could have the halt-at-slot
param, then use bank_from_snapshot_archives()
. The rest of the impl can (hopefully) be the same as bank_fork_utils::load_bank_forks()
.
Also probably need to change/move the code that purges old snapshots. I'd argue that ledger-tool create-snapshot
should never delete any existing snapshots.
I'd argue that
ledger-tool create-snapshot
should never delete any existing snapshots.
Completely agree
Also probably need to change/move the code that purges old snapshots. I'd argue that
ledger-tool create-snapshot
should never delete any existing snapshots.
Burned myself on this one today (luckily only lost a snapshot that was downloaded from bucket instead of one that I had to replay several 1000 slots to get to). In any case, https://github.com/solana-labs/solana/pull/27788 makes ledger-tool no longer purge snapshots by default
Reopening this one for consideration per some feedback from planned testnet restart - I'm still not sure we want to change behavior to differ from solana-validator
, but could definitely see a flag that allows loading from non-newest snapshot
there's some code in jito-solana that handles this fwiw
Problem
I'm trying to create a snapshot of the last slot in the epoch:
i added printing around the load_bank_forks and see this:
It is trying to start at a slot past my snapshot because of incremental snapshots.
Proposed Solution
The starting_slot should never be larger than the process_options.halt_at_slot. I think there's a bug in the loading from snapshot where it attempts to load incremental snapshots past the starting slot.
I ended up deleting all of our incremental snapshots and waiting for it from 146958936 to 146972255.