Closed jrschumacher closed 9 months ago
relates to #96
In our Policy Config table schema, we have a Foreign Key (FK) relationship from namespaces
to attribute definitions
, and another FK relationship from attribute definitions
to attribute values
. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.
If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:
These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.
Rely on PostgreSQL triggers on UPDATEs to state to cascade down
Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.
state
column in the three tables for speed of lookup/updateThe same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.
@jakedoublev would you do some research whether this would be supported in other DBs? If not, how would we go about supporting it?
Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? Seems like we could say "Postgres is the most performant DB we support, but we also support X, Y, and Z with some performance impact during these operations.
Lastly, consider the estimated frequency of usage:
Thanks for putting this together @jakedoublev.
To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic?
With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem.
would you do some research whether this would be supported in other DBs? @jrschumacher
It turns out support for sql triggers was wider than I anticipated. There are some differences in syntax and may be a little variation in Postgres cloud to cloud, but some semblance of SQL triggers exist across all of these. | DB | Support for Triggers | Reference Link |
---|---|---|---|
MySQL | β | docs | |
Oracle | β | docs | |
IBM Db2 | β | docs | |
SQLite | β | docs |
Could we utilize this approach for Postgres and then in future DBs we fall back to Option 3 or implement Option 2 in a driver approach? @jrschumacher
I think this is now the second time we've considered doubling down on Postgres's capabilities (see the metadata discussion here). I personally feel these are both small things to refactor if/when a need arises to support multiple DBs. To @biscoe916's point, avoiding 2 solutions to the same problem at the time we support multiple DBs will likely mean moving anything beyond basic SQL into the server anyway for the clearest path to broadest relational DB support.
To be honest, I'm not sure if performance is a realistic concern here. It seems most of the time this action will be run as a one off. Are there use-cases I'm not considering where the multiple queries to complete a soft delete will be problematic? @biscoe916
I think you're right and performance is indeed not a concern because of the infrequency of these deletions. It's something I felt/feel is always worth calling out, but realistically you are correct that there should be no felt impact by an end user.
With that said, I'm in favor of option 1, with the caveat that if in the future we decide to support databases other than Postgres, we switch to option 2 for all configurations so that we don't have 2 solutions to the same problem. @biscoe916
Thanks for the feedback! This makes sense and I will consider it the path forward.
Two more disadvantages to this approach
ADR: Soft deletes should cascade from namespaces -> attribute definitions -> attribute values
Taken from comment below https://github.com/opentdf/platform/issues/108#issuecomment-1941862182
Background
In our Policy Config table schema, we have a Foreign Key (FK) relationship from
namespaces
toattribute definitions
, and another FK relationship fromattribute definitions
toattribute values
. We have decided that due to the scenario above in the description of this issue, we want to rely on soft-deletes to avoid accidental or malicious creations of attributes/values in the place of their deleted counterparts.If we were relying on hard deletes, we would be given certain benefits by the relational FK constraint when deleting so that we could either:
These benefits of our schema and chosen DB would prevent unintended side effects and require thoughtful behavior on the part of platform admins. However, now that we are restricting hard deletes to dangerous/special rpc's and specific "superadmin-esque" functionalities for known dangerous mutations by adding active/inactive state to these three tables, we need to decide the cascading nature of soft deletes with inactive state.
Chosen Option:
Considered Options:
Rely on PostgreSQL triggers on UPDATEs to state to cascade down
Option 1: Rely on PostgreSQL triggers on UPDATEs to state to cascade down
Postgres triggers allow us to define the cascade behavior as the platform maintainers. Keeping the functionality within Postgres and not the server has additional benefits.
state
column in the three tables for speed of lookup/updateOption 2: Rely on the server's db layer to make DB queries that cascade the soft deletion down
The same as option 1, but with the cascading logic put into server-driven queries and not Postgres triggers.
Option 3. Allow INACTIVE namespaces with active attribute definitions/values, and INACTIVE definitions with ACTIVE namespaces and values
As a platform maintainer, I want to make sure that data which is deleted is soft-deleted so that I can prevent dangerous side effects and restore accidental deletes.
There are situations where the side effect of a delete could result in data leak if two admins are maintaining the platform. Example:
demo.com/attr/Classification/value/TopTopSecret
SecretSpy-SecretSantas-MailingList.csv.tdf
withdemo.com/attr/Classification/value/TopTopSecret
demo.com/attr/Classification/value/TopTopSecret
demo.com/attr/Classification/value/TopTopSecret
SecretSpy-SecretSantas-MailingList.csv.tdf
The soft-delete feature will prevent the recreation of the attribute with the same name on the same namespace.
Acceptance Criteria
-