fix: `stripUnknown` should honor local explicit `.unknown(false)`

afharo commented 6 months ago

When explicitly specifying .unknown(false), running validations with stripUnknown: true should respect the local override.

Marsup commented 5 months ago

Thanks, sorry for the delay.

papandreou commented 5 months ago

This change seems to break a construct that I'm using in a lot of places. When I want to allow an object to contain additional properties, but have Joi strip them out, I have:

const Joi = require('joi');
const schema = Joi.object({ foo: Joi.string()} )
  .unknown(false)
  .options({ stripUnknown:true });

console.log(schema.validate({foo: 'abc', bar: 'def'}));

With this change the validation fails:

{
  value: { foo: 'abc', bar: 'def' },
  error: [Error [ValidationError]: "bar" is not allowed] {
    _original: { foo: 'abc', bar: 'def' },
    details: [ [Object] ]
  }
}

... whereas before it succeeded and stripped out bar from the value:

{ value: { foo: 'abc' } }

I guess the reason why I'm using that construct is that I've liberally added .unknown(false).options({ stripUnknown: true }) to the top-level Joi.object schemas, because I've run into the problem that this is fixing 😅

I'll adapt my code, but this change probably shouldn't have gone out in a patch.

Marsup commented 5 months ago

I feared this would happen, but your schema is a bit self-contradictory, explicitly denying unknowns with the expectation that it would be stripped is in my opinion a bug, as local overrides should always have priority over options. I need to think about it a bit more before reverting (or not), but it really looks like you just got lucky that this bug was in your favor.

papandreou commented 5 months ago

Yeah, I agree. I think it ended up like that because (a few years back) I found that it was the only way to get Joi.object to work the way I needed it to :sweat_smile:

I've adopted the new version in all our projects now, so all good from me.

Alexander-Muylaert commented 2 months ago

Hi there

I think the stripUnknown is broken since this fix. I have gone through the commit and I think the tests are wrong.

It would be good to add a root level property "a2" on the object.
StripUnknown is set to true, so I would expect to strip unknown properties from obj.
b2 isn't existing in the schema, so it should be stripped
same for c2, this should also be stripped

I have added a few additional tests (see zip attached), they all fail now, but this is how I interprete the manual and parameters. It is also closer to behavior <= 17.13.1

joibug.test.zip

kind regards

Alexander

afharo commented 2 months ago

Just sharing my 2 cents, but I'll leave it to the owners to keep me honest.

Agreed! The tests are missing an extra root property to prove that it's removed due to the global stripUnknown: true option.

However, I tend to disagree on 3 and 4. Those objects explicitly allow extra properties to go through. The global option shouldn't affect those explicitly set flags.

A use case I can think of: A logger requires message and a meta object containing any additional context to the logs. Inside the meta, meta.trace_id is required to be an alphanum of 16 chars.

You want your schema to define message as string, and meta as an object with the property trace_id and an explicit local .unknown() to allow extra fields to go through the validation.

I hope it makes sense.

Marsup commented 2 months ago

I agree with this, as I previously mentioned, local overrides should always have priority over options, and this is what happens here, but we can indeed add a root property to demonstrate that fully.

Alexander-Muylaert commented 2 months ago

Hi all

I disagree, I do not understand the use of stripUnknown if it never strips unknown properties. Neither with unknown(true) or unknown(false). Am I missing something obvious here?

kind regards

Alexander

Marsup commented 2 months ago

It does when you don't give specific instructions for that schema, basically the default state. There is currently no way to go back to the default state but I think a unknown(null) is not super hard to implement if that's your need.

jandppw commented 1 month ago

Hi, all.

I believe there is a misunderstanding going on here. And I am probably most confused of all. Would somebody try to set me right?

Validation vs transformation

@afharo says:

However, I tend to disagree on 3 and 4. Those objects explicitly allow extra properties to go through. The global option shouldn't affect those explicitly set flags.

A use case I can think of: A logger requires message and a meta object containing any additional context to the logs. Inside the meta, meta.trace_id is required to be an alphanum of 16 chars.

You want your schema to define message as string, and meta as an object with the property trace_id and an explicit local .unknown() to allow extra fields to go through the validation.

I think we all agree that this is a good use case.

@Marsup ’s point that overrides should always have priority over options that have the same semantics is well taken too.

However, I believe .unknown() and stripUnknown have separate semantics, so that this does not apply.

The point is that

const schema2 = schema.unknown(false).options({ stripUnknown: true })

would be weird,

but

const schema3 = schema.unknown(true).options({ stripUnknown: true })

makes sense.

Transformation

When doing the following, the schema is not changed. @Alexander-Muylaert merely asks the transformation Joi does to strip the unknown properties.

 const { error, value } = schema.validate(obj, { stripUnknown: true });

A use case for this is as follows: suppose we are handling incoming JSON bodies in a service. Joi is used for validation and to sanitize the input (e.g., for security reasons). Joi's transformation functionality is used for the latter. schemaV1 has .unknown(true) set, among other options, to allow for future schema evolution. A future version might use schemaV2, which could include an additional property. A client might already provide that extra property. We can deploy the new client version before we start using the updated service version. While unknown properties are allowed during validation, we want to strip them during transformation and sanitization, recursively.

There are ample other examples where we might want to do this. E.g., suppose we want to write the JSON body in a DynamoDB table we want to keep clean (no unknown properties), but we do not want to flag unknown properties as errors.

Validation before transformation, or transformation before validation?

From a security standpoint, we should validate first, and possibly transform after that.

That is not what Joi does in all cases. It wouldn't make sense for Joi.string().trim().options({ convert: true }) to not trim first. The same applies to any other use of convert.

when true, attempts to cast values to the required types (e.g. a string to a number). Defaults to true.

any.cast(to) seems to does its job after validation?
any.default([value]) before?
any.failover(value) obviously after.
any.strip([enabled]) makes it explicit (after).
array.single([enabled]) before
…

I would expect unknown properties to be stripped after validation, for example, using object.pattern(pattern, schema, [options]). I believe that, in any case, for this (admittedly unusual) combination, that is the only meaningful option.

The same applies to .unknown(true). First validating, and letting pass unknown properties, and then stripping them, makes sense. A request to strip unknown properties after a validation failed because of .unknown(false) makes no sense. Stripping unknown properties first makes it irrelevant to validate there existence or not. After stripping, .unknown(false) will be true, and you can say .unknown(true), but it doesn’t matter.

Documentation

.unknown([allow]) does not mention that the value is supposed to be a 3-state. It says

if false, unknown keys are not allowed, otherwise unknown keys are ignored.

Note that object.keys([schema]) says

If schema is {} no keys allowed. If schema is null or undefined, any key allowed. If schema is an object with keys, the keys are added to any previously defined keys (but narrows the selection if all keys previously allowed). Defaults to 'undefined' which allows any child key.

So whether unknown properties are allowed in schema to start with, depends on how it was defined, but assuming it was defined with an object, unknown keys are not allowed if .unknown was not invoked.

I read this as, for the validation part:

schema                // unknown keys are not allowed
schema.unknown(false) // unknown keys are not allowed
schema.unknown(true)  // unknown keys are ignored
schema.unknown()      // unknown keys are ignored

(This is a bit weird to start with, because schema.unknown() is, in JavaScript, equivalent to schema.unknown(undefined), which also is a falsy parameter value, but that is water under the bridge. I understand @Alexander-Muylaert ’s confusion).

What is completely opaque to me now is what the effect of the following is supposed to be:

schema.options({ stripUnknown: true })                // meaningless
schema.unknown(false).options({ stripUnknown: true }) // meaningless
schema.unknown(true).options({ stripUnknown: true })  // unknown keys are ignored, but are not stripped?
schema.unknown().options({ stripUnknown: true })      // unknown keys are ignored, but are not stripped?

Is this correct? Probably not, because then .options({ stripUnknown: true }) has no effect in any case.

Some comments above seem to mean that instead unknown keys are stripped in 1 case only, before validation?

schema.options({ stripUnknown: true }) // strip first, then unknown keys are not allowed, because that is the default without `.unknown()`

In that light, I understand @Alexander-Muylaert ’s confusion.

What I would expect, is indeed stripping after validation:

schema.options({ stripUnknown: true })                // meaningless
schema.unknown(false).options({ stripUnknown: true }) // meaningless
schema.unknown(true).options({ stripUnknown: true })  // unknown keys are ignored and stripped
schema.unknown().options({ stripUnknown: true })      // unknown keys are ignored and stripped

Alternative for @afharo ’s use case

@afharo, in your use case, why would we ever activate stripUnknown? Isn’t the better solution here to explicitly not do that, but just set meta.unknown(true)?

afharo commented 1 month ago

IMO, there's a bit of confusion between allowUnknown (and its local short form .unknown()) and stripUnknown. The way I understand them based on how it works:

allowUnknown: allows the object to contain extra fields and returns them after sanitation/transformation.
stripUnknown: allows the object to contain extra fields, but removes them during sanitation.

The point is that

const schema2 = schema.unknown(false).options({ stripUnknown: true })

would be weird,

but

const schema3 = schema.unknown(true).options({ stripUnknown: true })

makes sense.

Based on my understanding of both settings as explained above, IMO, both of these are weird because they are contradictory (it doesn't matter if they are in the form of schema.unknown(true|false).options({ stripUnknown: true|false }) or schema.options({ allowUnknown: true|false, stripUnknown: true|false }), since they are local options vs. validation options.

In that sense,

Code	Description
`schema.validate(input, { allowUnknown: true, stripUnknown: true })`	It's contradictory and should never be used. Both allow extra fields to be provided, but one wants to keep them in the output while the other indicates removing them 🤷
`schema.validate(input, { allowUnknown: false, stripUnknown: true })`	Again, contradictory, and its usage is not encouraged: at the validation step, one claims that extra properties are not allowed while the other is fine with them.
`schema.validate(input, { allowUnknown: true, stripUnknown: false })`	Same as above.
`schema.unknown(true).validate(input, { allowUnknown: true })`	Setting the same option as local and global. Local override always wins. In this case: unknown properties are allowed and available in the output.
`schema.unknown(true).validate(input, { allowUnknown: false })`	Same as above.
`schema.unknown(false).validate(input, { allowUnknown: true })`	Setting the same option as local and global. Local override always wins. In this case: unknown properties will fail validation.
`schema.unknown(false).validate(input, { allowUnknown: false })`	Same as above.
`schema.unknown(true).validate(input, { stripUnknown: true })`	Setting different options as local and global. Local option wins, which is: unknown options are allowed and should be kept in the output.
`schema.unknown(true).validate(input, { stripUnknown: false })`	Same as above.
`schema.unknown(false).validate(input, { stripUnknown: true })`	Setting different options as local and global. Local option wins, which is: validation should fail if unknown fields are found.
`schema.unknown(false).validate(input, { stripUnknown: false })`	Same as above.
`schema.options({ stripUnknown: true }).validate(input, { stripUnknown: true })`	Setting the same option as local and global. Local override always wins. In this case: unknown properties are allowed and should be removed from the output.
`schema.options({ stripUnknown: true }).validate(input, { stripUnknown: false })`	Same as above.
`schema.options({ stripUnknown: false }).validate(input, {stripUnknown: true })`	Setting the same option as local and global. Local override always wins. In this case: unknown properties will fail validation.
`schema.options({ stripUnknown: false }).validate(input, {stripUnknown: false })`	Same as above.
`schema.options({ stripUnknown: true }).validate(input, { allowUnknown: true })`	Setting different options as local and global. Local option wins, which is: unknown properties are allowed and should be removed from the output.
`schema.options({ stripUnknown: true }).validate(input, { allowUnknown: false })`	Same as above.
`schema.options({ stripUnknown: false }).validate(input, { allowUnknown: true })`	Setting different options as local and global. Local option wins, which is: validation should fail if unknown fields are found.
`schema.options({ stripUnknown: false }).validate(input, { allowUnknown: false })`	Same as above.

Alternative for @afharo ’s use case

@afharo, in your use case, why would we ever activate stripUnknown? Isn’t the better solution here to explicitly not do that, but just set meta.unknown(true)?

Yes... sorry that I didn't provide an actual code. The way to implement the use case is:

const logEntry = {
  message: "Error: Something went terribly wrong",
  meta: {
    trace_id: "12345",
    extra_information: "https://www.youtube.com/watch?v=t3otBjVZzT0",
  },
  more_details: "...",
};

const loggerSchema = schema.object({
  message: schema.string(),
  meta: schema.object({
    trace_id: schema.string(), 
  }).unknown(true),
});

const validLogEntry = loggerSchema.validate(logEntry, { stripUnknown: true });
// validLogEntry = {
//   message: "Error: Something went terribly wrong",
//   meta: {
//     trace_id: "12345",
//     extra_information: "https://www.youtube.com/watch?v=t3otBjVZzT0",
//   },
// }

You could also implement it as loggerSchema.strip(true) or loggerSchema.options({ stripUnknown: true }) instead of applying the validate option. The difference is that the validate options apply recursively, while the local options don't, so if your use case includes an error property with selected properties only (to avoid leaking sensitive information), you'd have to implement it as:

const logEntry = {
  message: "Error: Something went terribly wrong",
  meta: {
    trace_id: "12345",
    extra_information: "https://www.youtube.com/watch?v=t3otBjVZzT0",
  },
  more_details: "...",
  error: {
    message: "Something went terribly wrong",
    code: "ETERRIBLYWRONG",
    request: {
      authentication: "Basic .....",
    },
  },
};

const loggerSchema = schema.object({
  message: schema.string(),
  meta: schema.object({
    trace_id: schema.string(), 
  }).unknown(true),
  error: schema.object({
    message: schema.string(),
    code: schema.string(),
  }),
});

const validLogEntry = loggerSchema.validate(logEntry, { stripUnknown: true });

// OR

const loggerSchema = schema.object({
  message: schema.string(),
  meta: schema.object({
    trace_id: schema.string(), 
  }).unknown(true),
  error: schema.object({
    message: schema.string(),
    code: schema.string(),
  }).strip(true),
}).strip(true);

const validLogEntry = loggerSchema.validate(logEntry);

// validLogEntry = {
//   message: "Error: Something went terribly wrong",
//   meta: {
//     trace_id: "12345",
//     extra_information: "https://www.youtube.com/watch?v=t3otBjVZzT0",
//   },
//   error: {
//     message: "Something went terribly wrong",
//     code: "ETERRIBLYWRONG",
//   },
// }

Again... it depends on your use case and code preferences/requirements to use it as a local option or global validate option. The global option avoids code repetition while the local ones are more explicit. In either case, you don't want stripUnknown to remove the properties from the fields that explicitly allow them in their local options.

hapijs / joi