json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.82k stars 266 forks source link

Data masking proposition #1182

Closed Kingnaoufal closed 2 years ago

Kingnaoufal commented 2 years ago

Context:

In some sensitive contexts like finance, we need functionality to mask some fields in a JSON schema before exposing them to the consumer.

Somehow this feature is not compatible with JSON schema because we will break the contract if we choose to omit some fields or if we replace some field values with XXXX, because all values are not of type string.

Otherwise, we can choose to create an unreadable schema's so we need to decorate all fields with an object that can be used to mask the field without breaking the contract, unfortunately, we will push the responsibility to the wrong side, let me explain why?

Imagine we have a financial product:

Proposition:

Because of this disjoint nature between a schema and his meta info that can morph the schema.

My proposition is to create another schema besides the one that defines the object we want to represent, this schema has a reference to this latter one.

When we use to validate the schema we will pass:

Upon validation, the tool will take the object data that do not contain masked fields(Missing fields) and both schemas (Product schema, data masking) and validate them accordingly so the fields that are required but missing and declared as masked fields should not break the validation.

So we let the consumer side represent missing fields based on their type with the correct representation.

Kingnaoufal commented 2 years ago

Hi @Relequestual ,

Is it the right place to ask for this feature or may I need to move it?

Thanks, Naoufal

gregsdennis commented 2 years ago

@Kingnaoufal do you think you could show an example of what you intend? That would help a lot.

Relequestual commented 2 years ago

@Kingnaoufal I think I mostly follow what you're asking, however I belive it stand very much outside the scope of JSON Schema.

JSON Schema is used to validate data. It doesn't really care about the context of your data. "Is property X a string?"

Functionally the product shouldn't have any notions about masking. A.K.A: the product should have a pure and simple JSON schema that defines its own functional constraints.

I both agree and disagree here. As the developer of the product, I need to know the exact structure and format of the data I'm going to get from an API call. If that's a "number or a string which is XXX", then I need to know that, and it needs to be part of the contract.

In some sensitive contexts like finance, we need functionality to mask some fields in a JSON schema before exposing them to the consumer.

Who exactly here is "the consumer". A customer of the product? If so, why would they care about the JSON Schema? An API user of your system for a financial service they are creating for one of their customers? In which case, as the developer, I need to know specifics, as per above.

I think @gregsdennis is right here, we need some examples.

Whatever the outcome, I can't see this being part of the specification. It's sounding a little like it's to do with the display of data or UI generation, which is out of scope, but ideal for a custom vocabulary.

awwright commented 2 years ago

I'm wondering if you can tell me a little bit more about how your API, and your data model, works. APIs that deal with partially redacted data are not well studied.

If I'm consuming your API, how would I know that a value is a "mask" and is not the real data? How do I know if I can change a value, even if I cannot read it? Under which cases would I be able to read the true value, that's usually masked?

If you can't show a value, why not just omit it entirely?

handrews commented 2 years ago

It's been 7 months without a response to any of the questions, so I'm closing this. Feel free to re-open or re-file if you can provide the requested information.