optakt / flow-dps

Flow Data Provisioning Service
Apache License 2.0
29 stars 13 forks source link

Resume indexing over existing index by default #493

Closed m4ksio closed 2 years ago

m4ksio commented 2 years ago

Goal of this PR

This may or may not be the right solution, its just the easiest implementation. Problem at hand is - we want to keep the same command for Docker purposes and make DPS choses whether to start new index or reuse existing one.

m4ksio commented 2 years ago

If we want to do this, we need (IMO) to also output an error message and fail when a checkpoint is given but an index already exists, suggesting the user to specify the --force option if they want to overwrite the index, or not specify a checkpoint otherwise.

WDYT?

This way we need a different command to start with fresh index and resume on given index, which is exactly what I try to avoid. It not only makes dockerfiles easier, but I would also argue that that's what most people would like anyway - resuming index without modifying command, simply ignoring checkpoint. Or maybe we can have a separate mode to operate like this.

Ullaakut commented 2 years ago

@m4ksio No, this way if you want to have one single command to do both, you'll simply need to use the --force argument in both cases. When starting fresh, it will have no effect, and when restarting it will work as you expect.

Meanwhile, a user that runs the DPS as a CLI, not through a Dockerfile, and that expects a consistent UX, will have proper error handling to work with.

I think it's an unwritten but universal rule that any CLI that has a --force argument must fail when it's missing in an ambiguous case, like git does it, many built-in UNIX tools, and more. Since this is an Open Source project, having a good and consistent UX is primordial IMO.

EDIT: My bad, what @awfm9 says is right, it's not even needed in the end since resuming works with the checkpoint. Marking this comment as outdated.

awfm9 commented 2 years ago

I'm not sure I follow at all. As it is now:

For case 1 and 2, the command is exactly the same, which I gather is the behaviour you want?

The result between case 2 and 4 is the same as well. It just takes longer to resume without checkpoint.

awfm9 commented 2 years ago

After reviewing and confirming, the behaviour as you want is already the case, so closing.

awfm9 commented 2 years ago

@m4ksio let us know if we are missing something, or if something is not working as designed. From what I can see, we do exactly as you want.

When the index database is empty, we use loader.FromCheckpoint to load the initial trie from the root checkpoint.

if empty {
    file, err := os.Open(flagCheckpoint)
    if err != nil {
        log.Error().Err(err).Msg("could not open checkpoint file")
        return failure
    }
    defer file.Close()
    load = loader.FromCheckpoint(file)

When the index isn't empty, and we have a checkpoint, we use the index loader with the checkpoint loader as initializer, so we initialize the trie using the root checkpoint, and then restore the remaining heights from the index:

} else if flagCheckpoint != "" {
    file, err := os.Open(flagCheckpoint)
    if err != nil {
        log.Error().Err(err).Msg("could not open checkpoint file")
        return failure
    }
    defer file.Close()
    load = loader.FromCheckpoint(file)
    load = loader.FromIndex(log, storage, indexDB,
        loader.WithInitializer(load),
        loader.WithExclude(loader.ExcludeAtOrBelow(first)),
    )

Finally, when the index isn't empty, and we don't have a checkpoint, we use the index loader with the empty loader as initializer, so we start with an empty trie and restore all heights from the index:

} else {
    load = loader.FromIndex(log, storage, indexDB)
}