opentdf / platform

OpenTDF Platform monorepo enabling the development and integration of _forever control_ of data into new and existing applications. The concept of forever control stems from an increasingly common concept known as zero trust.
BSD 3-Clause Clear License
18 stars 7 forks source link

Policy API: Add soft delete feature #108

Closed jrschumacher closed 7 months ago

jrschumacher commented 7 months ago

ADR: Soft deletes should cascade from namespaces -> attribute definitions -> attribute values

Taken from comment below https://github.com/opentdf/platform/issues/108#issuecomment-1941862182

Background

In our Policy Config table schema, we have a Foreign Key (FK) relationship from namespaces to attribute definitions, and another FK relationship from attribute definitions to attribute values. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.

If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:

  1. cascade a delete from an attribute definition to its values, OR
  2. prevent deleting an attribute unless its associated values had been deleted first

These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.

Chosen Option:

Considered Options: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

  1. Rely on PostgreSQL triggers on UPDATEs to state to cascade down
  2. Rely on the server's db layer to make DB queries that cascade the soft deletion down
  3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

Option 1: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.

Option 2: Rely on the server's db layer to make DB queries that cascade the soft deletion down

The same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.

Option 3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values


As a platform maintainer, I want to make sure that data which is deleted is soft-deleted so that I can prevent dangerous side effects and restore accidental deletes.

There are situations where the side effect of a delete could result in data leak if two admins are maintaining the platform. Example:

The soft-delete feature will prevent the recreation of the attribute with the same name on the same namespace.

Acceptance Criteria

-

strantalis commented 7 months ago

relates to #96

jakedoublev commented 7 months ago

ADR: Soft deletes should cascade from namespaces -> attribute definitions -> attribute values

Background

In our Policy Config table schema, we have a Foreign Key (FK) relationship from namespaces to attribute definitions, and another FK relationship from attribute definitions to attribute values. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.

If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:

  1. cascade a delete from an attribute definition to its values, OR
  2. prevent deleting an attribute unless its associated values had been deleted first

These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.

Chosen Option:

Considered Options: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

  1. Rely on PostgreSQL triggers on UPDATEs to state to cascade down
  2. Rely on the server's db layer to make DB queries that cascade the soft deletion down
  3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

Option 1: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.

Option 2: Rely on the server's db layer to make DB queries that cascade the soft deletion down

The same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.

Option 3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

jrschumacher commented 7 months ago

@jakedoublev would you do some research whether this would be supported in other DBs? If not, how would we go about supporting it?

Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? Seems like we could say "Postgres is the most performant DB we support, but we also support X, Y, and Z with some performance impact during these operations.

Lastly, consider the estimated frequency of usage:

biscoe916 commented 7 months ago

Thanks for putting this together @jakedoublev.

To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic?

With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem.

jakedoublev commented 7 months ago

would you do some research whether this would be supported in other DBs? @jrschumacher

It turns out support for sql triggers was wider than I anticipated. There are some differences in syntax and may be a little variation in Postgres cloud to cloud, but some semblance of SQL triggers exist across all of these. DB Support for Triggers Reference Link
MySQL βœ… docs
Oracle βœ… docs
IBM Db2 βœ… docs
SQLite βœ… docs

Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? @jrschumacher

I think this is now the second time we've considered doubling down on Postgres's capabilities (see the metadata discussion here). I personally feel these are both small things to refactor if/when a need arises to support multiple DBs. To @biscoe916's point, avoiding 2 solutions to the same problem at the time we support multiple DBs will likely mean moving anything beyond basic SQL into the server anyway for the clearest path to broadest relational DB support.

To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic? @biscoe916

I think you're right and performance is indeed not a concern because of the infrequency of these deletions. It's something I felt/feel is always worth calling out, but realistically you are correct that there should be no felt impact by an end user.

With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem. @biscoe916

Thanks for the feedback! This makes sense and I will consider it the path forward.

dmihalcik-virtru commented 6 months ago

Two more disadvantages to this approach