ipfs / go-datastore

key-value datastore interfaces
MIT License
228 stars 64 forks source link

Define Delete to be idempotent #104

Closed Stebalien closed 5 years ago

Stebalien commented 5 years ago

Moving from: https://github.com/ipfs/go-ds-flatfs/issues/30

  1. We usually only care if the operation succeeded.
  2. We definitely don't want to abort a batch operation because a delete failed.
  3. User code often just aborts/returns errors without thinking about them.
  4. Not all datastores can tell us if there was nothing to delete (e.g., badger).
  5. Flatfs no longer reliably tells us this.
  6. HTTP delete is defined to be idempotent so many web-backed datastores won't be able to fulfill this contract either. (also, there's a reason HTTP delete is idempotent...).

This boils down to:

  1. We can't reliably return ErrNotFound on delete making it useless except, maybe, as a source of metrics.
  2. Having to handle ErrNotFound on delete in user code is annoying.
  3. Having to try to return ErrNotFound on delete is annoying at best.

Proposal: Define it to be idempotent (i.e., return nil when the key isn't found).

Stebalien commented 5 years ago

For example, see: https://github.com/ipfs/go-datastore/pull/103/files#diff-42ea5667b327bb207485077410d5f499R42. I could fix that, but it would require a bunch of extra state I don't really want to track (and it would cause a failed delete to fail the entire batch.

kevina commented 5 years ago

I don't have a strong objection. I would prefer a way to communicate back that a key is not found if that information is known, but as this may complicate things, this is not a hard requirement.

Stebalien commented 5 years ago

The user can still call Has first. I agree that exposing all information we know is a good idea, it just makes life hard in practice (in this case).

raulk commented 5 years ago

Generally, I agree. DELETEs are idempotent in SQL, leveldb, badger, redis, but some databases will report the number of keys deleted. Maybe we should add capability for returning an (int, error), where the former can take value -1 if the DB does not support this feature?

For the record, go-ds-badger does not return ErrNotFound on deleting an inexistent key; it simply forwards the error from badger, which does not complain in this case.

  1. We definitely don't want to abort a batch operation because a delete failed.

In most of cases, yes. But ultimately this depends on the application logic. For example, in the Datastore TTL shim I've half-baked, we have two keys per TTL entry (for good reasons), and we might want deletes pertaining to a single TTL entry to be as atomic as possible.

Stebalien commented 5 years ago

In most of cases, yes. But ultimately this depends on the application logic.

Ah, sorry, I meant we don't want to abort if the delete failed because the item wasn't there. We probably do want to abort on other errors.

Maybe we should add capability for returning an (int, error), where the former can take value -1 if the DB does not support this feature?

So we'd return:

(where we'd only return an error if something went wrong)

Personally, I'm not sure if implementing that is worth the work/complexity. Making DELETE idempotent is quite easy:

  1. Slowly remove all cases where we return ErrNotFound on delete. We're already failing to do this in some cases so this won't break anything.
  2. Eventually, define DELETE to be idempotent.
  3. Finally, remove all checks for ErrNotFound on delete.

Basically, we can do this slowly without having to fix everything up-front. Changing the method signature, on the other hand, will take a bit more work.

We could still do it, I just want to make sure the change is well motivated (i.e., we have a use-case where we actually care about this). If we do run into a case where we'd like to know, we can change the signature later (well, it'll be a breaking change but it'll be a breaking change either way).

kevina commented 5 years ago

@Stebalien I agree this is over-complicating things. I think the best to not worry about for now. A better way to handle this is to query the datastore afterwards for the status of the last operataion, but this will require the use of handles, which is something we may want to do anyway down the road.

bigs commented 5 years ago

i like @raulk's suggestion about returning the number of keys deleted... DBs that support versioning (badger) could potentially delete more than one key. so i'd think of it as <0, 0, >0

Stebalien commented 5 years ago

@raulk @kevina @bigs so, are you fine making delete idempotent (for now) where, sometime in the future, we could consider adding a delete count? Really, I'm just trying to (a) simplify some code that tries to hard to track whether or not the operation succeeded and (b) trying to nail down semantics (I hate undefined behavior).

kevina commented 5 years ago

@Stebalien that fine with me.

bigs commented 5 years ago

absolutely. fine with me! On Oct 24, 2018, 3:57 PM -0400, Kevin Atkinson notifications@github.com, wrote:

@Stebalien that fine with me. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

raulk commented 5 years ago

Idem. I have no immediate use case right now for the delete count.

schomatis commented 5 years ago

SGTM, in my book Delete just means that following calls to Get are the ones that will actually return the ErrNotFound error, no guarantees of what was there before issuing the call.

MichaelMure commented 5 years ago

I was bitten by this one. In particular, the Delete function of MapDatastore in this repository is not idempotent. As it is often used to implement a test suite, it also force in some situation to deal with this case.

MichaelMure commented 5 years ago

Another one not idempotent is flatfs.

Stebalien commented 5 years ago

All done!