`CreateTDF` to have shared options to yield NanoTDF and ZTDF

strantalis commented 5 months ago

Originally when we designed CreateTDF we decided to leverage the functional options pattern. For consistent developer experience (DX) we want to ensure the nanoTDF SDK shares a similar experience.

As we work on this, it's important to consider the DX fully. A fair statement would be that developers are not specialists in OpenTDF, TDF or the platform as a whole. Additionally, they are probably more interested in their business cases and not on becoming a platform specialist.

To facilitate this user persona, we want to redefine the creation of TDFs. Instead of letting CreateTDF be solely for ZTDF, we'd like to enhance it so it can support creation of ZTDF and NanoTDF. To better understand how we'd approach this, let's consider some stories:

As a developer, I want to use only one function to create TDFs. I don't know what ZTDF and NanoTDF is, I just want a TDF and I want the intelligence of the SDK to give me the best TDF.
As a developer, I know that I am building for optimization and I want to use NanoTDF, but I want to make sure that I get an error if I try to add an option that is invalid for NanoTDF.
As a developer, I know that I am building for richness of data and I want to use ZTDF, but I want to make sure that I get an error if I try and add an option that is invalid for ZTDF.

With these stories, we can see that the developer wants to rely on the intelligence of the platform to choose the TDF format that best suits their needs based on their inputs, rather than their knowledge of the platform and technology. With ZTDF and NanoTDF we can reliably choose the technology for them while offering them an escape hatch to choose the format if they so desire.

Proposed options

WithFormat(format TDFFormat) - specifies that the format
- const TDFFormatZTDF = TDFFormat("ztdf")
- const TDFFormatNanoTDF = TDFFormat("nanotdf")
WithDataAttributes(attributes ...string) - specifies the attributes which should be added
- if split is detected then format should yield ZTDF
- if split is detected and format is NanoTDF, return error
WithDataAttributesValues(attributes ...policy.Value) - specifies the attribute values types
- see WithDataAttributes
WithKASInformation(kasInfoList ...KASInfo) - adds all the kas urls and their corresponding public keys
- if multiple KAS specified then format should yield ZTDF
- if multiple KAS specified and format is NanoTDF, return error
- if KAS publickey is RSA then format should yield ZTDF
- if KAS publickey is RSA and format is NanoTDF, return error
WithKeyPair(keypair ...) - specifies the keypair to use
- if keypair is RSA then format should yield ZTDF
- if keypair is RSA and format is NanoTDF, return error
WithECDSAPolicyBinding() - enables ECDSA policy binding
WithAutoconfigure(enable bool)

NanoTDF only -- if format ZTDF then return error

ZTDF only -- if format NanoTDF then return error

WithAssertions(assertionList ...AssertionConfig)
WithAssertionVerificationKeys(keys AssertionVerificationKeys)
WithMetaData(metadata string) - specifies the metadata which should be added
- format should yield ZTDF
- if format is NanoTDF, return error
WithMimeType(mimeType string) - specifies the mimetype to use
- format should yield ZTDF
- if format is NanoTDF, return error (assuming this is stored in metadata
WithSegmentSize(size int64) -
- TBD

Acceptance Criteria

define the logic that determines the format which will be chosen
- this is important to ensure our SDKs (in the future) will have the same behavior
implement errors that indicate conflicting state, so developers can check errors.Is() and gracefully handle them
- e.g. var ErrCreateTDFNanoTDFNoMetadata = errors.New("nanotdf cannot be created with metadata, use ztdf")
update CreateTDF options to support nanoTDF
add logic to check options and return the correct streamable tdf format
add logic to check options and return appropriate errors if conflict
ensure we meet backwards functionality with CreateTDF otherwise propose a new function for this behavior (e.g. NewTDF)

cassandrabailey293 commented 2 months ago

need to further discuss https://github.com/opentdf/platform/pull/1534#issuecomment-2341364537 . product will work to define acceptance criteria for these tickets. we will pull these out of the sprint and reload next sprint once additional context and a/c is added.

sujankota commented 2 months ago

@jrschumacher NanoTDF and ZTDF are fundamentally different formats, each designed for specific purposes. While they may share some commonality in what they accept, having separate APIs for developers provides greater clarity. This approach is similar to the distinction between TCP and UDP in networking protocols.

dmihalcik-virtru commented 2 months ago

IMO it depends on the use case and also we could see newer formats being added. Like adding a tagging PDP to provide attributes, or the autoconfiguration of the key splits with those attributes, these features can move the logic from the application developer and to the ABAC or system approach. This way we could deploy a novel format and transition to it with no changes to the application code, or an administrator can switch between existing formats by updating ABAC rules, both of which seem valuable to me.

That said, the two formats are quite different and put different constraints on both the service (requiring different crypto primitives on both the server and client), and what is possible to contain (ZTDF for large content with random access, nano for small content and speed and low overhead, data set for splitting the difference)

jrschumacher commented 2 months ago

@sujankota This feedback is coming from our experience with working with a real world product. I agree they are fundamentally different I'm not sure developer will see this.

With the referenced POC product we are offering a visualization of data that is encrypted with either ZTDF or NanoTDF. The upstream system is automated and uses NanoTDF for performance and storage optimization reasons. The data within is JSON.

If the payload of the JSON exceeds the maximum payload for NanoTDF then the system will switch to ZTDF. We've had to build this logic into our application for both encryption and decryption for Java, Go, and Web.

In this case I don't care what the format is I just want to protect the data and want the platform to help me with my task. If I want to have more control then I can use a dedicated interface.

I guess it begs the question: how much commitment to understanding the platform fundamentals do we require of developers in order to use it? Is that for evaluation or production?

Aligning with TCP and UDP analogy this is why some systems use TCP fallback for UDP requests; when the reply is too large and the request should be made again using TCP.

sujankota commented 2 months ago

@jrschumacher Should we consider providing a wrapper interface for this specific use case, rather than merging both formats into a single interface? Combining the formats forces developers to understand and manage more complexity. In the future, we might introduce datasets for nanotdf, which won't be compatible with ZTDF. That said, I don't have deep insight into how customers are currently using the SDK, so my perspective is somewhat limited. Given that the SDK is quite basic, it should offer a simple API with a clear focus on solving one specific problem.

That said, I'm happy to proceed with whatever direction the architecture team decides.

jrschumacher commented 2 months ago

@sujankota maybe we take a step back and talk about usecases and let that drive the solution.

I really like your idea where the friendly interface describes the problem / solution. (Tradeoff being less control and slightly less performant)

I'd imagine we'd always have the core functions as well CreateZTDF() and CreateNanoTDF().

damorris25 commented 1 month ago

I would second Sujan's idea for a more 'layered' approach. In my experience, good SDK's have a lowest layer that provides the most flexibility and power and then over time offer 'higher order' layers that maybe combine various options or multiple calls into a single operation / call that has less power / flexibility but is much easier to work with.

I'd prefer we create an 'easy button' type API that is higher level and does indeed abstract away some of the underlying complexity. If we do so, I might argue that you also look at removing some options (I don't have a proposed list of what to remove off the top of my head) as well - make it as simple as possible. If the simple option is too simple for a given use-case, then developers can use the lower level option.

opentdf / platform