opentdf / platform

OpenTDF Platform monorepo enabling the development and integration of _forever control_ of data into new and existing applications. The concept of forever control stems from an increasingly common concept known as zero trust.
BSD 3-Clause Clear License
17 stars 5 forks source link

ADR: Resource Mapping enhancements to accommodate PEP needs #1209

Closed jrschumacher closed 1 week ago

jrschumacher commented 1 month ago

Resource Mapping enhancements to accommodate PEP needs

Context and Problem Statement

As resource mappings have been explored by PEPs, it has been determined they lack some functionality. This ADR will cover a variety of changes to the resource mappings. The key changes include:

Resource Mapping FQN

The resource mapping FQN is focused on the ability to reference a unique mapping by a unique string that is portable. With the latest addition of policy import/export and the desire of our users to subscribe to external policy bundles we need to ensure there is uniqueness that is meaningful and can be transported.

To support this the suggested FQN structure is https://<namespace>/<prefix>/<unique-name>. This will align with our existing attribute structure of https://demo.com/attr/my_attr/val/my_val

Assign to namespace

Assigning the resource mapping to a namespace will require adding a new foreign key to the namespace table. This relationship will be a 1:* between namespace and resource mapping.

Migration: each resource mapping will need to be tied to a namespace. If there is one namespace in the existing policy, then we will automatically assign resource mappings to those namespaces. If there are more than one, we will need to recommend that users migrate to a single namespace and manually rectify the situation later.

Going forward, every resource mapping MUST have a namespace.

Assign a unique name

Assigning a unique name to resource mappings will require that each resource mapping will get a unique name.

Migration: automatically name the resource mappings based on the known data between the terms and the attributes. Users can change these names using unsafe services or by directly manipulating the database.

Proto & RPC

Protos will need to be updated to support the new properties of namespace and name.

Get Resource Mapping By FQNs

This RPC will primarily be used by PEPs.

We need the ability to one or more resource mapping by FQN. This means we should be able to fetch a collection of resource mappings based on an array of FQNs provided. These FQNs do not need to be associated to one or another.

If the FQN does not exist, we should indicate that the single FQN was not found and return the rest of the values. The request should be considered successful in general.

If no FQNs are found then the request should be considered a failure.

Get Resource Mappings by Attribute FQN

This RPC will primarily be used by admins.

Admins will likely want to know which resource mappings are associated to an attribute FQN. This will only support a single Attribute FQN.

SDK

SDK will need to be updated, but only with the generated SDKs based on proto changes.

CLI

Reversal

There is a need for reversing the resource mappings:

Given an attribute, what are the terms that should be applied to a piece of data

Since the resource mappings are a 1:* resource mapping to attributes, its not so easy to reverse this. Reversal will require, either:

Decisions needed

Questions

Decision

FQN Structure

https://<namespace>/resm/<unique_name>

This keeps the format consistent with attribute FQNs in an attempt to avoid any unnecessary confusion.

DB Changes

RPCs

Update existing ResourceMappingService to include:

damorris25 commented 1 month ago

I'd suggest an FQN structure of resm so it's similar to attr - 4 characters so it's not too long but still descriptive enough to be meaningful.

jakedoublev commented 1 month ago

Given the current schema, each Resource Mapping (RM) is already currently under a Namespace as 1 RM -> 1 attr value -> 1 attr def -> 1 namespace. It seems like there's therefore a need not specifically to namespace RMs themselves, but to group resource mappings and ensure each group is 1) of a single namespace, and 2) CRUD-able as a portable, interoperable unit.

This is my understanding of the use case: "I need to import vendor_a's published tags for known external policy into resource mappings for namespaced attributes, theoretically alongside resource mappings I've defined and those from vendor_b all for the same attribute values".

If all the above is accurate, I think we could create groups of RMs under fully qualified names (FQNs).

FQN Options for RM Groups

https://<RM publisher>/resm/<namespace-association>

This is not the pure definition of an FQN from an attribute standpoint where the first value is a namespace authority, but this would avoid a potentially equally confusing scenario where tag_vendor_a and tag_vendor_b are both publishing groups of RMs under acme.co and neither owns that namespace (i.e. https://acme.co/resm/tag_vendor_a). In this case, the true authority on the Resource Mappings being published is actually the tagging vendor and not the attribute policy owner.

https://<namespace>/resm/<RM publisher>

This is perhaps equally clunky. While it more closely aligns with the attribute FQN format, the authority on the tags and RMs being published is the group name. The expected product use case is that the RM publisher is sharing tags that map to externally published policy, not that a policy publisher supports tags/terms compatible with every tagging software vendor.

Flow Example

Tagging software vendor tag_vendor.com wants to support interoperability with the published policy for nato.intl and publishes a bundle of Resource Mappings for their proprietary tags/terms under FQN: https://tag_vendor.com/resm/nato.intl.

A NATO platform or a platform that has imported NATO nato.intl namespace policy can then import the bundle of Resource Mappings published by tag_vendor.com for nato.intl.

This FQN can be provided by PEPs or an admin UX to a new policy RPC GetResourceMappingsByGroupFqns to resolve to a ResourceMapping[] where each RM in the list points to the FQN'd attribute value of the specified namespace (i.e. ResourceMapping[]{{AttrValueFQN: "https://nato.intl/attr/classification/value/topsecret", Terms: ["specific_tag_vendor_format::TS"]...},{...}} ). If the attribute value FQN is returned in this RPC response, an individual name/FQN for a single RM should not be necessary, only for the group.

In the importing platform's policy, the namespace for the imported group would be nato.intl, and the RM group name would be tag_vendor.com as the publisher of the resource mappings. Compatibility is ensured by defining attribute values within each RM by FQN.

Edge cases to handle

  1. I'm importing published RMs and encounter an RM for an attribute value not found present within my platform to map to
  2. The published RMs of a given namespace are derived from a policy version that differs from the published policy version I've previously imported (should I always bail, check for breaking changes and try to proceed, or ignore and proceed)?
  3. Can I import RMs for a published namespace but point them to my own namespace (a maintained policy fork, if you will)? (my vote is no to reduce complexity)

Admin answers

The above implementation would give admins these answers:

  1. which RMs were imported from tag_vendor.com
  2. which RMs were not imported from tag_vendor.com
  3. can I bulk delete the RMs for a tagging vendor I no longer want to utilize
  4. can I build and export my own vendor-specific RMs for a namespace
jakedoublev commented 1 month ago

Is it necessary to be able to import a namespaced group of Resource Mappings for policy that is not also imported? In other words, I'd ingest https://vendor_a/resm/yournamespace.io that contains RMs pointing to attribute values under definitions under yournamespace.io, but I'd point them to my namespace on import instead of yours? For example: https://vendor_a/resm/yournamespace.io includes a RM for https://yournamespace.io/attr/hierarchical/value/highest but I want to point it at https://mine.co/attr/hierarchical/value/highest instead.

jakedoublev commented 1 month ago

Update from @jrschumacher regarding Architecture discussion alignment: "We should sanity check/validate that referenced policy bundles are installed, but not dive in deep."

So if I or Titus/Varonis/Boldon James/whoever publish a bundle for an external, 3rd party policy bundle, the exported RM bundle metadata will drive an import to check compatibility that the RMs match:

  1. the attribute value FQNs exist in my platform as importer
  2. the policy version (known as the service module package version) is compatible
damorris25 commented 1 month ago

@jakedoublev from what I understand of the requirements / need I believe the 'grouping' concept would suffice. I think @ttschampel should comment before we complete the review / comment cycle.

One tactical suggestion is to adopt an FQN that is more of a URN than a URL. We've been discussing (in another ADR / issue) the challenges that using a fully qualified URL that includes a hostname and protocol create on the infrastructure.

ttschampel commented 1 month ago

The grouping concept is sufficient from my point of view. I'd like an FQN (URN or otherwise) as a reference to a group of resource mappings I can pull in and use.

ryanulit commented 1 month ago

After further discussion with @jakedoublev, we've settled on the approach below for implementation. @jakedoublev or @jrschumacher please feel free to comment on anything I've missed or misinterpreted. I'll be creating an epic and splitting up the work into tickets on that epic.

Final FQN Format

https://<namespace>/resm/<unique_name>

This keeps the format consistent with attribute FQNs in an attempt to avoid any unnecessary confusion.

One tactical suggestion is to adopt an FQN that is more of a URN than a URL. We've been discussing (in another ADR / issue) the challenges that using a fully qualified URL that includes a hostname and protocol create on the infrastructure.

Regarding @damorris25's comment above, is this still a requirement for this work?

DB Changes

API/Proto/CLI/etc. changes

jrschumacher commented 1 month ago

@ryanulit great feedback.

I think URN makes sense. There shouldn't need to be any changes to namespace since we don't specify a schema for the name.

We can update the FQN index to support URNs as well as URLs or make a new index that is more robust and keeps index of the various segments rather than the full string. This can be done later as part of optimization.

Regarding the table name keep consistent and if consistent behavior isn't yet defined please define it so we can maintain in the future.

ryanulit commented 1 month ago

Decided to go with resource_mapping_groups and keep the plural syntax just at the end.

Regarding the queries for pulling the RM groups using FQNs, I'm assuming we would receive the FQNs as a single string and then break them down into the proper column values needed to search the table with? Meaning we will never store the full FQN in our table(s), for the immediate future at least, but provide it as a convenience to the consumer in the query?

ryanulit commented 3 weeks ago

For tracking purposes...after discussions with @jakedoublev, we've decided these unsafe RPCs are no longer needed at the moment. Update operations will be considered safe, and we will revisit later if deemed necessary.

Unsafe RPCs

New unsafe RPCs will be needed:

  • move resource mapping to new namespace
  • change resource mapping name
damorris25 commented 1 week ago

@ryanulit @jrschumacher is this ADR still open or is the ADR now 'closed' since we've actually started implementation?

jrschumacher commented 1 week ago

ADR is closed. Thanks.