samvera / valkyrie

A Data Mapper library to enable multiple backends for storage of files and metadata in Samvera
Other
34 stars 23 forks source link

Persisters should error if asked to save a persisted resource which doesn't already exist. #837

Closed tpendragon closed 3 years ago

tpendragon commented 4 years ago

Right now the following works:

resource = metadata_adapter.persister.save(resource: Resource.new)
metadata_adapter.persister.delete(resource: resource)
saved_again = metadata_adapter.persister.save(resource: resource)
saved_again.id == resource.id # => true

However, this causes race conditions with background jobs. It should do the following:

resource = metadata_adapter.persister.save(resource: Resource.new)
metadata_adapter.persister.delete(resource: resource)
saved_again = metadata_adapter.persister.save(resource: resource)
# => Valkyrie::Persistence::ObjectNotFoundError

I'm open to other ideas, instead of ObjectNotFoundError. A quick experiment with AR shows that it just returns true for save, but doesn't do anything.

dgcliff commented 4 years ago

ObjectNotFoundError seems to be the best fit, but perhaps we can highlight for the user the cause of the issue to help them untangle it. If they arrived at the error as a result of a race condition, they may not even know when/where the resource was deleted.

Setting the error message could potentially fulfill both goals.

Valkyrie::Persistence::ObjectNotFoundError.new("Attempted save on persisted resource which was not found") (stand-in language, I'm sure a better explanation is possible)

no-reply commented 4 years ago

could this be adapter specific behavior?

my intuition is that making the #persisted? check an API level contract is introducing unneeded state dependence, and likely to end up being more complicated than we bargained for. is there a proposed generic implementation that doesn't involve an extra round-trip to the backend when saving an object for which #persisted? is true?

it seems reasonable (already) that an adapter or backend might want to avoid recreating items with the same ids as any that have already been deleted, and that they might use a variety of strategies to ensure that.

i think i'd also like to hear more about the nature of the race conditions. pulibrary/figgy#4174 provides a hint, but i'm curious to know more.

tpendragon commented 4 years ago

I don't think it should be adapter specific, but I'm sure we can find efficient methods of implementation. Fedora, for instance, will have a tombstone and just throw an error. A database save could UPDATE, and solr's cheap to check.

The race condition for us was the following:

A. User uploads an item with 600 pages. 25 workers promptly start working them. B. Worker A pulls the object it generates derivatives for, creates a derivative. This takes a bit to process. C. User deletes the object because they've found some mistake and want to reingest. This deletes all the pages. C. Worker A saves the page it was generating a derivative for. Now the page is suddenly persisted again, and separated from its parent (which is gone.)

hackartisan commented 4 years ago

Work started on branch https://github.com/samvera/valkyrie/compare/master...837-persisters-should-error

Note one iCLA will be needed before this can be merged. -- Update: CLA is received / on file

tpendragon commented 4 years ago

I added some commits to https://github.com/samvera/valkyrie/compare/master...837-persisters-should-error and now only Solr is left, but I don't have a good plan on how to make Solr efficiently.

tpendragon commented 4 years ago

I also don't know what to do about #save_all. Erroring beforehand would make it stop in the middle of saving them with some of these implementations.

hackartisan commented 4 years ago

Hm. For save_all, perhaps the strategy AR uses would be worth considering - don't error but also don't do anything.