thoth-station / storages

Storage and database adapters for project Thoth
https://thoth-station.github.io/
GNU General Public License v3.0
14 stars 16 forks source link

Feature: support for async deletion of a package index #2657

Open VannTen opened 2 years ago

VannTen commented 2 years ago

Problem statement

For https://github.com/thoth-station/management-api/issues/790, the management API need to trigger the deletion of an index. But it could potentially take a long time to delete all the related storage items, and the management API needs to return in a reasonnable timeframe (http, so in seconds).

Proposal description

Split the deletion in two part:

  1. mark the package index as 'deleted' (or 'to_delete')(similarly to the 'enabled/disabled' state -> this is called by the management-api
  2. delete all storage related to indexs marked 'deleted' (graph + ceph) -> this is called by an async workflow created by the management-api

The purpose of splitting is to be tolerant of failures/timeout etc of the "delete workflow".

Alternatives

Skip the first step and directly create the "delete workflow".

However, this seems fragile in certain cases:

It does avoid changing the DB schema though.

Additional context

Acceptance Criteria

TODO

goern commented 2 years ago

/sig stack-guidance /priority important-longterm

VannTen commented 2 years ago

So, my last thoughts on this:

First I would need a query of taking a package index and returning all the currently stored document ids in object storage.

Once that done, the workflow is basically:

  1. Mark package index as deleted.
  2. Query PackageIndex -> all ids -> delete all ids
  3. Delete Package index -> postgres cleanup all related sql items by cascading.

Does that seems realistic ? I'm still not completely at ease with the storage model, so opinions on that strategy would be welcome.

@mayaCostantini