Response content type `gzip`

kean commented 2 years ago

I'm using OpenAPIKit release/3_0, and I've been testing it against many open specs (I'm working on a new code generator for Swift, releasing soon). One of the specs I tested was App Store Connect API spec.

They use the gzip content type for some responses:

  "200" : {
    "description" : "List of SalesReports",
    "content" : {
      "gzip" : {
        "schema" : {
          "type" : "string",
          "format" : "binary"
        }
      }
    }
  }

Parsing fails with the following error:

"ERROR! The spec is missing or invalid. dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "The given data was not valid YAML.", underlyingError: Optional(Expected `gzip` value in .content for the status code '200' response of the **GET** endpoint under `/v1/financeReports` to be parsable as ContentType but it was not.)))"

I think it's an incorrect content type, but tools like https://editor.swagger.io are able to parse and display it:

Screen Shot 2021-12-19 at 12 53 51 PM

I'm wondering what are your thoughts on this. Yes, a mime type usually consists of a type and a subtype. And gzip is most certainly not the correct content type (I think). But should the parser be validating it or maybe it should just put it into the .other category and let the tool that use OpenAPIKit handle it?

kean commented 2 years ago

Yeah, this is a weird spec:

included:
  items:
     oneOf:
       - $ref: "#/components/schemas/AppCategory"
       - $ref: "#/components/schemas/AppCategory"

And then

Screen Shot 2021-12-19 at 1 05 24 PM

mattpolzin commented 2 years ago

The more of these situations I see (where a spec author has made a small mistake or even a decision to do something that makes sense to them even though it falls slightly outside the spec), the more I think it hurts folks using OpenAPIKit more than it helps to be quite this strict. On the other hand, I want to retain as much of the ability to guarantee that specs written in OpenAPIKit directly are totally valid.

Ultimately, I like your idea of parsing this but putting it into other; I might even want to go so far as to parse it as a new enum option "invalid" that holds the string value and the error we would have reported in a parse failure otherwise. This paired with an on-by-default Validation that fails when any "invalid" content types are found would I think be a nice place to end up.

mattpolzin commented 2 years ago

By the way, I would love to add this and other examples you find appropriate to OpenAPIKit's compatibility suite if you don't mind mentioning to me where you found them, which ones you recommend for the purpose, what they exemplify, and if you know whether I would have permission to use them. A lot of factors, but please share if you have time!

[EDIT] I see you've already linked to the App Store Connect documentation where the spec is linked (thanks!)

kean commented 2 years ago

makes sense to them even though it falls slightly outside the spec

Yeah, that's a massive problem. Many specs I found are not even valid YAML – not computer-readable.

to guarantee that specs written in OpenAPIKit directly are totally valid.

For what it's worth, I have a --strictmode in my tool. By default, the tool will ignore certain errors and just skip the properties/paths/entities that it fails to process. But with a --strict mode, any issues are treated as a critical errors.

By the way, I would love to add this and other examples you find appropriate to OpenAPIKit's compatibility suite

I'm adding specs from https://apis.guru to my suite. I'm doing snapshot tests: a spec as input and generated Swift code as output. I've already added about 15 spec, but each one brings new challenges.

kean commented 2 years ago

By the way, thanks for writing this framework. I'm using a pre-release version and, so far, it's been super reliable and easy to use! The test suite for it is also impressive.

kean commented 2 years ago

Oh, and also, https://apis.guru doesn't always have the latest versions of the specs. So I usually check with the official documentation site first.

mattpolzin commented 2 years ago

The test suite for it is also impressive.

A nice way of saying "overkill" :)

Then again, I have definitely cut down on errors in releases by making sure that almost all code, even just different combinations of coded properties, is exercised.

kean commented 2 years ago

Another good source of specs: https://app.swaggerhub.com/search

mattpolzin commented 2 years ago

@kean Do you have any thoughts on https://github.com/mattpolzin/OpenAPIKit/pull/242? I think it's the approach I want to go with, and I'll then make other types conform to HasWarnings one by one to support other non-critical errors without making the parser fall over in the process.

kean commented 2 years ago

From my user perspective, I'm primarily looking for a parser that is fast and has loose validation. I think it'll be great if it could optionally even skip not just warnings, but errors too. For example, if one of the schemas is incorrect, I'm OK if the parser ignores it. Many users of code generators, including me, want them to generate at least something. Then they manually finalize the code to their liking. That's why I'm adding a --strict mode which is off by default.

I think parsing and validation are two separate concerns and are often implemented by different tools. OpenAPI has a few validators available: https://openapi.tools/#description-validators. One of the major differences is that for validation, you typically don't want to fail on the first error or warning. You want the validator to go through the entire document and collect all warnings and errors.

I think it's doable to combine parsing and validation with Decodable, though I haven't tried building anything like this yet. For something that complicated, I think having entities decode themselves will no longer cut it (it'll also pollute the API which is not ideal). I suggest adding a concept of "parsers." Talk is cheap, so I made a quick prototype: https://gist.github.com/kean/118ae053c5d610069a8e60d37e2d999b

mattpolzin commented 2 years ago

I think if I'm being realistic about the time I'll have to put into this in the near future, rewriting the mechanism OpenAPIKit uses to parse documents is going to be out of scope. OpenAPI is a large specification and it took quite a while just to write all of the existing decoding code so separating it out and in effect rewriting it sounds like a large time commitment to me.

That's why I think the short term solution will be along the lines of loosening the requirements instead of a totally new parsing strategy.

Languages that aren't swift (specifically really dynamic ones) are perhaps naturally suited to being really loose about interpreting schemas. For example, in JavaScript if you read in a value that should be a number but it's actually a quoted value, just treat it like a number anyway and it'll all work out unless it doesn't. In swift, being that flexible means explicitly trying to turn a string into a number to see if it'll work.

I do think there are reasonable places to be much less strict than OpenAPIKit currently is, but I'm not immediately sure if a new parsing strategy gets us there faster or not.

kean commented 2 years ago

By the way, I already use custom parsers, not unlike the one I'm suggesting here, to parallelize decoding. I added them without modifying any of the existing OpenAPIKit code – composition works nicely. I can easily add "skip bad schemas" on my end with composition and with minimum changes. I'll explore this approach a bit more in the upcoming days.

kean commented 2 years ago

I should probably open another issue, but I think it's related.

I've encountered the following content-type in one of the specs: "application/json; charset=utf-8", which is a valid content-type. The problem is that current ContentType doesn't check for the charset or other parameters and classifies it as ContentType.other("application/json; charset=utf-8"), which isn't ideal. I would've expected ContentType.json (and maybe a separate list of parameters).

mattpolzin commented 2 years ago

Hmm, yeah, I knew about the charset identifier so I am surprised I never actually tested OpenAPIKit's ability to parse that kind of content type properly -- thanks for the additional note!

mattpolzin commented 2 years ago

I already use custom parsers, not unlike the one I'm suggesting here, to parallelize decoding. I added them without modifying any of the existing OpenAPIKit code.

That's pretty awesome. I had envisioned your parallelization workaround for parsing being a fork of OpenAPIKit when you first mentioned it in your performance GitHub ticket.

I wonder if that doesn't make a good permanent standalone project (certainly it is something others would be interested in using). If OpenAPIKit provided "basic" decoding capabilities that "worked" but weren't ideal for every use-case but then folks could grab a library that added a different parsing layer then there could even be multiple different parsers out there but we would retain a common core representations of the OpenAPI document nonetheless.

If you ever want to flesh out your new parser as a standalone library, I would be more than happy to add it to the list of curated integrations in OpenAPIKit's readme for others to more easily find!

mattpolzin commented 2 years ago

I fixed the content type support for parameters as part of the same already open PR, no need for another ticket.

mattpolzin / OpenAPIKit

Response content type `gzip` #238