opentdf / platform

Persistent data centric security that extends owner control wherever data travels
BSD 3-Clause Clear License
19 stars 10 forks source link

ADR: NanoTDF Attribute Storage Optimization in 255-bytes #917

Closed jrschumacher closed 4 months ago

jrschumacher commented 5 months ago

ADR: NanoTDF Attribute Storage Optimization in 255-bytes

Context and Problem Statement

We need to store attributes in the nanotdf policy header, but it has a maximum length of 255 bytes. The current attribute value in ztdf is in the form of a Fully Qualified Name (FQN) as JSON. Even removing the JSON overhead, the FQN is still too verbose to store multiple attributes within the 255-byte limit.

Example:

https://namespace.com/attr/classification/val/topsecret

The goal is to define a syntax that will compress the data to allow for efficient storage of multiple attributes within the 255-byte limit.

Considered Options

  1. Schema-Based Syntax with Full URLs
  2. Index-Based Syntax
  3. Protobuf Compression

Decision Outcome

We have decided to use the Schema-Based Syntax with Full URLs. This decision was made based on the need for a federatable and customer-friendly approach that retains full attribute names and avoids using indexes.

We also considered Protobuf Compression for further optimization, however this makes ease of debugging more difficult since the data cannot be easily read without a protobuf decoder.

Options

Option 1: Schema-Based Syntax with Full URLs

Format:

{schema}|{base_url}|{attribute}:{value,{...value}}\n{attribute}:{value,{...value}};...

Components:

Example:

1|namespace.com|classification:topsecret;relto:usa,gba,cda
1|ns.namespace.com|group:a

Advantages:

Disadvantages:

Approximate Range of Attributes

Given the 255-byte limit, the number of attributes that can be stored depends on the length of the base URLs and attribute names. For estimation:

Example calculation for a single attribute set:

1|namespace.com|classification:topsecret

This example is about 40 bytes.

For multiple attributes:

1|namespace.com|classification:topsecret;relto:usa,gba,cda

This example is about 60 bytes.

For multiple attributes across multiple namespaces:

1|namespace.com|classification:topsecret;relto:usa,gba,cda
1|namespace2.com|classification:topsecret;relto:usa,gba,cda
1|namespace3.com|classification:topsecret;relto:usa,gba,cda
1|namespace4.com|classification:topsecret;relto:usa,gba,cda

This example is about 240 bytes.

Therefore, approximately 15-20 attributes of similar length can be stored within the 255-byte limit.

Example

See playground https://go.dev/play/p/M9s8QOtTn4Y

Option 2: Index-Based Syntax

Format:

{schema}|{index}|{attribute_index}:{value_index};{attribute_index}:{value_index};...

Components:

Example:

1|1|1:1;2:2,3,4

Advantages:

Disadvantages:

Option 3: Protobuf Compression

Protobuf can serialize the data into a compact binary format, potentially reducing the size further than ASCII or other text-based formats.

Advantages:

Disadvantages:

Protobuf Example

syntax = "proto3";

enum Schema {
  HTTP = 0;
  HTTPS = 1;
}

message Attribute {
  string name = 1;
  repeated string values = 2;
}

message AttributeSet {
  Schema schema = 1;
  string base_url = 2;
  repeated Attribute attributes = 3;
}
damorris25 commented 5 months ago

No concerns from my POV

sujankota commented 5 months ago

Attributes are stored in Policy the max size is 2^16 -1 in NanoTDF?https://github.com/virtru/nanotdf/blob/master/spec/index.md

jrschumacher commented 5 months ago

@sujankota according to https://github.com/virtru/nanotdf/blob/master/spec/index.md#342-policy the policy has a Maximum Length (B) of 255. Am I misreading this?

CleanShot 2024-06-03 at 15 51 12

sujankota commented 5 months ago

image We use Embedded Policy for nanoTDF.

sujankota commented 5 months ago

Encrypted policy could be upto 64kb

jrschumacher commented 4 months ago

This work is not needed (see comments above).