Policy API: Add soft delete feature

jrschumacher commented 9 months ago

ADR: Soft deletes should cascade from namespaces -> attribute definitions -> attribute values

Taken from comment below https://github.com/opentdf/platform/issues/108#issuecomment-1941862182

Background

In our Policy Config table schema, we have a Foreign Key (FK) relationship from namespaces to attribute definitions, and another FK relationship from attribute definitions to attribute values. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.

If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:

cascade a delete from an attribute definition to its values, OR
prevent deleting an attribute unless its associated values had been deleted first

These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.

Chosen Option:

Considered Options: `Rely on PostgreSQL triggers on UPDATEs to state to cascade down`

Rely on PostgreSQL triggers on UPDATEs to state to cascade down
Rely on the server's db layer to make DB queries that cascade the soft deletion down
Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

Option 1: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.

🟩 Good, because cascading behavior of inactive state makes the most sense when the user intention is to delete (which is still going to be a relatively dangerous mutation)
🟩 Good, because keeping the cascade in the DB is always going to be more optimal than multiple queries
🟩 Good, because we are indexing on the state column in the three tables for speed of lookup/update
🟩 Good, because it has already been proven out with an integration test for repeatability in this branch
- 🟩 Good, because this does not block any superadmin/dangerous/special deletion capability and will be fully distinct from any cascade/constraint handling there
🟨 Neutral, because triggers are a Postgres feature, but we haven't made any firm decisions yet about what other SQL databases/versions we'll support or if we'll require customers to use the latest PostgreSQL
🟥 Bad, because it's a less well-known feature of Postgres
🟥 Bad, because we will only be able to ALWAYS cascade the INACTIVE UPDATE down the tree and will not get the foreign key constraint of a one-off deletion if that's what the user really intended. We'll need to make it clear to them what their change will do.

Option 2: Rely on the server's db layer to make DB queries that cascade the soft deletion down

The same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.

🟩 Good, because it does not tie us to any Postgres-specific feature and can be reused across SQL db's
🟩 Good, because of all the other good benefits of option 1
🟥 Bad, because performance: anything being soft deleted will mean multiple round trips
🟥 Bad, because more room for bugs: anything being soft deleted will mean multiple queries
🟥 Bad, because we can more easily end up in a bad state where the server fails or a secondary/tertiary query fails but the first succeeded

Option 3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

🟩 Good, because it gives maximum control to the user
🟥 Bad, because that maximized control is actually more confusing
🟥 Bad, because is most likely to cause a bad state where access is not allowed but an unknown reason
🟥 Bad, because it is unintuitive from an Engineering/maintenance perspective

As a platform maintainer, I want to make sure that data which is deleted is soft-deleted so that I can prevent dangerous side effects and restore accidental deletes.

There are situations where the side effect of a delete could result in data leak if two admins are maintaining the platform. Example:

Admin A adds attribute demo.com/attr/Classification/value/TopTopSecret
- Creates subject mapping with Deep Secret Spy
User A creates TDF SecretSpy-SecretSantas-MailingList.csv.tdf with demo.com/attr/Classification/value/TopTopSecret
Admin A deletes attribute demo.com/attr/Classification/value/TopTopSecret
Admin B add attribute demo.com/attr/Classification/value/TopTopSecret
- and creates subject mapping with Top Secret Toy Inventor of Tops
User B with Top Secret Toy Inventor of Tops subject attribute accesses SecretSpy-SecretSantas-MailingList.csv.tdf

The soft-delete feature will prevent the recreation of the attribute with the same name on the same namespace.

Acceptance Criteria

-

strantalis commented 9 months ago

relates to #96

jakedoublev commented 9 months ago

ADR: Soft deletes should cascade from namespaces -> attribute definitions -> attribute values

Background

In our Policy Config table schema, we have a Foreign Key (FK) relationship from namespaces to attribute definitions, and another FK relationship from attribute definitions to attribute values. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.

If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:

cascade a delete from an attribute definition to its values, OR
prevent deleting an attribute unless its associated values had been deleted first

These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.

Chosen Option:

Considered Options: `Rely on PostgreSQL triggers on UPDATEs to state to cascade down`

Rely on PostgreSQL triggers on UPDATEs to state to cascade down
Rely on the server's db layer to make DB queries that cascade the soft deletion down
Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

Option 1: Rely on PostgreSQL triggers on UPDATEs to state to cascade down

Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.

🟩 Good, because cascading behavior of inactive state makes the most sense when the user intention is to delete (which is still going to be a relatively dangerous mutation)
🟩 Good, because keeping the cascade in the DB is always going to be more optimal than multiple queries
🟩 Good, because we are indexing on the state column in the three tables for speed of lookup/update
🟩 Good, because it has already been proven out with an integration test for repeatability in this branch
- 🟩 Good, because this does not block any superadmin/dangerous/special deletion capability and will be fully distinct from any cascade/constraint handling there
🟨 Neutral, because triggers are a Postgres feature, but we haven't made any firm decisions yet about what other SQL databases/versions we'll support or if we'll require customers to use the latest PostgreSQL
🟥 Bad, because it's a less well-known feature of Postgres
🟥 Bad, because we will only be able to ALWAYS cascade the INACTIVE UPDATE down the tree and will not get the foreign key constraint of a one-off deletion if that's what the user really intended. We'll need to make it clear to them what their change will do.

Option 2: Rely on the server's db layer to make DB queries that cascade the soft deletion down

The same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.

🟩 Good, because it does not tie us to any Postgres-specific feature and can be reused across SQL db's
🟩 Good, because of all the other good benefits of option 1
🟥 Bad, because performance: anything being soft deleted will mean multiple round trips
🟥 Bad, because more room for bugs: anything being soft deleted will mean multiple queries
🟥 Bad, because we can more easily end up in a bad state where the server fails or a secondary/tertiary query fails but the first succeeded

Option 3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values

🟩 Good, because it gives maximum control to the user
🟥 Bad, because that maximized control is actually more confusing
🟥 Bad, because is most likely to cause a bad state where access is not allowed but an unknown reason
🟥 Bad, because it is unintuitive from an Engineering/maintenance perspective

jrschumacher commented 9 months ago

@jakedoublev would you do some research whether this would be supported in other DBs? If not, how would we go about supporting it?

Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? Seems like we could say "Postgres is the most performant DB we support, but we also support X, Y, and Z with some performance impact during these operations.

Lastly, consider the estimated frequency of usage:

Read - VERY HIGH
Write - HIGH
Update - LOW - MEDIUM
Delete - VERY LOW - LOW

biscoe916 commented 9 months ago

Thanks for putting this together @jakedoublev.

To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic?

With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem.

jakedoublev commented 9 months ago

would you do some research whether this would be supported in other DBs? @jrschumacher

It turns out support for sql triggers was wider than I anticipated. There are some differences in syntax and may be a little variation in Postgres cloud to cloud, but some semblance of SQL triggers exist across all of these.	DB	Support for Triggers
MySQL	✅	docs
Oracle	✅	docs
IBM Db2	✅	docs
SQLite	✅	docs

Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? @jrschumacher

I think this is now the second time we've considered doubling down on Postgres's capabilities (see the metadata discussion here). I personally feel these are both small things to refactor if/when a need arises to support multiple DBs. To @biscoe916's point, avoiding 2 solutions to the same problem at the time we support multiple DBs will likely mean moving anything beyond basic SQL into the server anyway for the clearest path to broadest relational DB support.

To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic? @biscoe916

I think you're right and performance is indeed not a concern because of the infrequency of these deletions. It's something I felt/feel is always worth calling out, but realistically you are correct that there should be no felt impact by an end user.

With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem. @biscoe916

Thanks for the feedback! This makes sense and I will consider it the path forward.

dmihalcik-virtru commented 7 months ago

Two more disadvantages to this approach

Google Cloud Spanner does not yet support Triggers
Deleting a parent or grandparent object will cause changes to the rows for the children and grandchildren. This means if I toggle visibility of a namespace object, all corresponding attributes and instance values will be left in a 'deleted' state. If I'd already had some marked as 'deleted', it will be difficult to sort through and undelete the recently deleted items only

opentdf / platform