pb33f / libopenapi-validator

OpenAPI validation extension for libopenapi, validate http requests and responses as well as schemas
https://pb33f.io/libopenapi/validation/
Other
53 stars 19 forks source link

Problem with passing string type parameters using numbers in path #51

Open lance0805 opened 8 months ago

lance0805 commented 8 months ago

For example, in this schema, the username is defined as a string type. If called in this way, it will lead to the inability to find the route:

https://github.com/pb33f/libopenapi-validator/blob/657229e4dc4f54ea6d298dec56fec8b500baa3e7/test_specs/petstorev3.json#L818-L836

request := &http.Request{
    Method: request.Method,
    URL:    'http://localhost:8080/user/123456',
}
_, _, pathKey := paths.FindPath(request, xxxSchema)
// pathKey is nil

is it possible to remove the string validation here since numbers can be safely used as strings?

https://github.com/pb33f/libopenapi-validator/blob/657229e4dc4f54ea6d298dec56fec8b500baa3e7/paths/paths.go#L284-L288

daveshanley commented 8 months ago

The schema states string the value passed is clearly an integer. I don't think the validator is wrong here, it's the design of the API that is incorrect. The schema needs to be changed.

lance0805 commented 8 months ago

I agree with your point, especially from a strict interpretation. The challenge I'm facing is providing metrics for a gateway, which includes logging request paths. To prevent metrics from becoming overwhelming due to numerous path variables, we convert actual paths to OpenAPI routes. This means even incorrect request parameters, like in the example /user/123456, are recorded in a standardized route format /user/{username}. The purpose is to ensure even erroneous requests are accurately logged, represented as request_xxx{path="/user/{username}", statusCode="4xx"} in our metrics.

jxsl13 commented 8 months ago

validating a string is pretty pointless when, from a library perspective, you do not know what it is supposed to contain, as a string can more or less contain anything. For example, if you want to be strict, you'd have to validate against everything that someone can imagine not to be a string. A boolean, a timestamp, someone inserting a json into that string, etc. Usages one cannot validate because the library does not know. This check gives the impression of strictness but what it does is just a basic validation against one if many types which is imo pointless because a string can contain anything that is serializable as a utf8 string. The interesting part is the other way around, an integer cannot be a string, as it has a well defined format.

daveshanley commented 8 months ago

I agree with your point, especially from a strict interpretation. The challenge I'm facing is providing metrics for a gateway, which includes logging request paths. To prevent metrics from becoming overwhelming due to numerous path variables, we convert actual paths to OpenAPI routes. This means even incorrect request parameters, like in the example /user/123456, are recorded in a standardized route format /user/{username}. The purpose is to ensure even erroneous requests are accurately logged, represented as request_xxx{path="/user/{username}", statusCode="4xx"} in our metrics.

I don't want to take away this strictness from the validator, in my opinion - it's doing the right thing. What you need is more flexibility to relax the rules right?

daveshanley commented 8 months ago

validating a string is pretty pointless when, from a library perspective, you do not know what it is supposed to contain, as a string can more or less contain anything. For example, if you want to be strict, you'd have to validate against everything that someone can imagine not to be a string. A boolean, a timestamp, someone inserting a json into that string, etc. Usages one cannot validate because the library does not know. This check gives the impression of strictness but what it does is just a basic validation against one if many types which is imo pointless because a string can contain anything that is serializable as a utf8 string. The interesting part is the other way around, an integer cannot be a string, as it has a well defined format.

I don't agree. The job of the validator is to check if what you are doing is what you said you would do. It does not care about context.

If the schema states 'string' and you pass in anything other than a number, bool, array or an object.. you're golden. If the string is of a specific type, then use format which will give another level of validation.

1234 and 123.4 are not strings, they are numbers. They will always be numbers.

There is always the option to tell the validator to 'pretend they are not numbers for a second', but saying this is 'pointless' isn't accurate at all. The whole purpose of this code is to validate as best it can.

daveshanley commented 8 months ago

And, remember folks - if you disagree with my opinions :)

Screenshot 2024-02-10 at 9 09 21 AM
jxsl13 commented 8 months ago

my point is: validading that it is not a number does not imply that it is a string, as it could be a boolean or anything else that you do not know.

daveshanley commented 8 months ago

To me, this makes perfect logical sense.

// check if the param is a number or not
        schema := params[p].Schema.Schema()
        for t := range schema.Type {
            switch schema.Type[t] {
            case helpers.String, helpers.Object, helpers.Array:
                // should not be a number.
                if _, err := strconv.ParseFloat(s, 64); err == nil {
                    s = helpers.FailSegment
                }
            case helpers.Number, helpers.Integer:
                // should not be a string.
                if _, err := strconv.ParseFloat(s, 64); err != nil {
                    s = helpers.FailSegment
                }
                // TODO: check for encoded objects and arrays (yikes)
            }
        }
    }

If the schema is defined as a string, an object or an array.. check if it's numeric, if it is.. its no good.

If the schema is defined as numeric, check the input is numeric, if it's not, it is no good.

Why is this pointless?

jxsl13 commented 8 months ago

if you only look at those two cases and if there are and always will be only these two cases: case 1: string, object, array case 2: number, integer

To some degree it makes sense that the (positive) validation logic of the numbers-case is inversed (negative) in the other case.

Imo, it would be better if the first case was also a "positive" validation, meaning it is a string, it is an object, it is an array contrary to it not a number and not whatever else may be added in the future.

That would lead to the positive check: "It is a valid utf8 string" which may contain anything that is valid utf8. I'm only speaking about the string case (ignoring object and array). Inherently a string can contain anything like uuids, boolean values or mathematically precise numbers (interesting in the financial sector).

Maybe my assumption is incorrect what this code is exactly targeting, is it the format of a string type or is it the actual string type itself?

lance0805 commented 8 months ago

I agree with your point, especially from a strict interpretation. The challenge I'm facing is providing metrics for a gateway, which includes logging request paths. To prevent metrics from becoming overwhelming due to numerous path variables, we convert actual paths to OpenAPI routes. This means even incorrect request parameters, like in the example /user/123456, are recorded in a standardized route format /user/{username}. The purpose is to ensure even erroneous requests are accurately logged, represented as request_xxx{path="/user/{username}", statusCode="4xx"} in our metrics.

I don't want to take away this strictness from the validator, in my opinion - it's doing the right thing. What you need is more flexibility to relax the rules right?

Yes, having a function that does not require validation, or one that allows for custom validation logic, would better suit my use case.