opentdf / platform

Persistent data centric security that extends owner control wherever data travels
BSD 3-Clause Clear License
19 stars 11 forks source link

`CreateTDF` to have shared options to yield NanoTDF and ZTDF #1042

Open strantalis opened 5 months ago

strantalis commented 5 months ago

Originally when we designed CreateTDF we decided to leverage the functional options pattern. For consistent developer experience (DX) we want to ensure the nanoTDF SDK shares a similar experience.

As we work on this, it's important to consider the DX fully. A fair statement would be that developers are not specialists in OpenTDF, TDF or the platform as a whole. Additionally, they are probably more interested in their business cases and not on becoming a platform specialist.

To facilitate this user persona, we want to redefine the creation of TDFs. Instead of letting CreateTDF be solely for ZTDF, we'd like to enhance it so it can support creation of ZTDF and NanoTDF. To better understand how we'd approach this, let's consider some stories:

With these stories, we can see that the developer wants to rely on the intelligence of the platform to choose the TDF format that best suits their needs based on their inputs, rather than their knowledge of the platform and technology. With ZTDF and NanoTDF we can reliably choose the technology for them while offering them an escape hatch to choose the format if they so desire.

Proposed options

NanoTDF only -- if format ZTDF then return error

ZTDF only -- if format NanoTDF then return error

Acceptance Criteria

cassandrabailey293 commented 2 months ago

need to further discuss https://github.com/opentdf/platform/pull/1534#issuecomment-2341364537 . product will work to define acceptance criteria for these tickets. we will pull these out of the sprint and reload next sprint once additional context and a/c is added.

sujankota commented 2 months ago

@jrschumacher NanoTDF and ZTDF are fundamentally different formats, each designed for specific purposes. While they may share some commonality in what they accept, having separate APIs for developers provides greater clarity. This approach is similar to the distinction between TCP and UDP in networking protocols.

dmihalcik-virtru commented 2 months ago

IMO it depends on the use case and also we could see newer formats being added. Like adding a tagging PDP to provide attributes, or the autoconfiguration of the key splits with those attributes, these features can move the logic from the application developer and to the ABAC or system approach. This way we could deploy a novel format and transition to it with no changes to the application code, or an administrator can switch between existing formats by updating ABAC rules, both of which seem valuable to me.

That said, the two formats are quite different and put different constraints on both the service (requiring different crypto primitives on both the server and client), and what is possible to contain (ZTDF for large content with random access, nano for small content and speed and low overhead, data set for splitting the difference)

jrschumacher commented 2 months ago

@sujankota This feedback is coming from our experience with working with a real world product. I agree they are fundamentally different I'm not sure developer will see this.

With the referenced POC product we are offering a visualization of data that is encrypted with either ZTDF or NanoTDF. The upstream system is automated and uses NanoTDF for performance and storage optimization reasons. The data within is JSON.

If the payload of the JSON exceeds the maximum payload for NanoTDF then the system will switch to ZTDF. We've had to build this logic into our application for both encryption and decryption for Java, Go, and Web.

In this case I don't care what the format is I just want to protect the data and want the platform to help me with my task. If I want to have more control then I can use a dedicated interface.

I guess it begs the question: how much commitment to understanding the platform fundamentals do we require of developers in order to use it? Is that for evaluation or production?


Aligning with TCP and UDP analogy this is why some systems use TCP fallback for UDP requests; when the reply is too large and the request should be made again using TCP.

sujankota commented 2 months ago

@jrschumacher Should we consider providing a wrapper interface for this specific use case, rather than merging both formats into a single interface? Combining the formats forces developers to understand and manage more complexity. In the future, we might introduce datasets for nanotdf, which won't be compatible with ZTDF. That said, I don't have deep insight into how customers are currently using the SDK, so my perspective is somewhat limited. Given that the SDK is quite basic, it should offer a simple API with a clear focus on solving one specific problem.

That said, I'm happy to proceed with whatever direction the architecture team decides.

jrschumacher commented 2 months ago

@sujankota maybe we take a step back and talk about usecases and let that drive the solution.

I really like your idea where the friendly interface describes the problem / solution. (Tradeoff being less control and slightly less performant)

I'd imagine we'd always have the core functions as well CreateZTDF() and CreateNanoTDF().

damorris25 commented 1 month ago

I would second Sujan's idea for a more 'layered' approach. In my experience, good SDK's have a lowest layer that provides the most flexibility and power and then over time offer 'higher order' layers that maybe combine various options or multiple calls into a single operation / call that has less power / flexibility but is much easier to work with.

I'd prefer we create an 'easy button' type API that is higher level and does indeed abstract away some of the underlying complexity. If we do so, I might argue that you also look at removing some options (I don't have a proposed list of what to remove off the top of my head) as well - make it as simple as possible. If the simple option is too simple for a given use-case, then developers can use the lower level option.