spec-first / connexion

Connexion is a modern Python web framework that makes spec-first and api-first development easy.
https://connexion.readthedocs.io/en/latest/
Apache License 2.0
4.48k stars 762 forks source link

Breaking up large swagger.yaml files #254

Closed patrickw276 closed 1 year ago

patrickw276 commented 8 years ago

Is there any way to break up large swagger.yaml files into smaller pieces? I've tried using external references but it doesn't seem like Connexion supports these. I've also tried using jinja features to break the swagger files up but this doesn't work (and its ugly).

Am I missing a feature that would let me do this? If not, is relative referencing a feature Connexion would be interested in? I know that the jsonschema library used internally supports relative referencing so it might not be too difficult to implement. If you guys are interested, I can take a crack at adding the feature.

hjacobs commented 8 years ago

@patrickw276 I think it's not supported right now in Connexion, but "people" might be interested in this feature. I personally don't have the need for it as IMHO "microservices" should not have large Swagger files :smirk:.

rafaelcaricio commented 8 years ago

As @hjacobs said, one using Connexion might be interested in this kind of feature. It is something interesting to support. Please send a PR and we will review. 😃

patrickw276 commented 8 years ago

@hjacobs My use case doesn't involve very much processing but the model's schema is somewhat complex. It doesn't make much sense to break the API into smaller microservices (IMO).

I've started working on a PR where the swagger document's external references are resolved into a single dictionary before being passed to the swagger validator. For example:

# doc1.yaml
key1: value1
key2:
  $ref: 'doc2.yaml#/key3'

# doc2.yaml
key3: value3

will resolve to a dictionary that looks like

{'key1': 'value1', 'key2': 'value3'}

Recursive references will be handled by pointing the recursive $ref to its new location in the combined document. Do you guys see any issues with this approach? Also, I don't plan on implementing non-local reference resolution (e.g. using http).

rafaelcaricio commented 8 years ago

A simpler way, since Connexion already uses Jinja2 for template rendering, would be making use of the {% include "something.yaml" %} directive. All necessary changes to make this work is to configure the Template#environment attribute in the Connexion code to use the jinja2.FileSystemLoader loader. Then you would be able to break your Swagger/OpenAPI definition in several files.

patrickw276 commented 8 years ago

@rafaelcaricio That would definitely be simpler to implement BUT you can get in to some weird issues with local references. These references would have to be declared relative to the complete concatenated document, not the document they actually exist in. I'm not sure how often this would be an issue in a real API but it's something to consider.

I favor the reference resolution approach I outlined above because it implements a feature as outlined in the Swagger/OpenAPI Specification.

rafaelcaricio commented 8 years ago

I do not dislike your solution. I am just trying to show more possibilities. ✌️

Anyway, where is this outlined in the Swagger/OpenAPI Specification?

patrickw276 commented 8 years ago

@rafaelcaricio It's good to be open to other approaches, so I definitely appreciate the input (and all your guys work on Connexion in general).

The spec mentions them here and a few other places.

EDIT: Actually, here is a better place in the document to read, along with the link to canonical dereferencing.

rafaelcaricio commented 8 years ago

@patrickw276 Sounds good. 👍

ibigpapa commented 7 years ago

If you are looking for an alternative while this gets baked in. I used a node app to help because I have a swagger api that is split up across 15 to 20 files fairly large.

The tool I used is really simple to create a single swagger file that connexion will work with as it'll fully dereference pointers for you.

https://www.npmjs.com/package/swagger-cli

here's a quick start for you

npm install -g swagger-cli
swagger validate <path_to_root_spec>

Once it's validated it can make a single file doing the following

swagger bundle -r -o <output_path> <path_to_root_spec>

The -r is to fully dereference.

do3cc commented 7 years ago

Out of curriosity, how is that handled in Zalando? Is this file https://api.zalando.com/schema/swagger.json also bundled?

ibigpapa commented 7 years ago

Looks like you can find more on the shop api here https://github.com/zalando/shop-api-documentation

hjacobs commented 7 years ago

@do3cc I personally don't know anybody in Zalando who splits up Swagger files (mostly it's "microservices", right?). Our RESTful API Guidelines also don't cover this.

patrickw276 commented 7 years ago

I came to the conclusion that to do this cleanly, loading the individual files instead of bundling, the yelp swagger_spec_validator would need to support yaml directly. I opened up a PR on that repo but never got any feedback.

I do think there is a valid use case for this feature that can't be solved with microservices. I work with earth science metadata where the schemas can be quite complex. It wouldn't really make sense to break an api storing this data into microservices but breaking the swagger spec up could help with managing the complexity.

advance512 commented 7 years ago

Any update on this? It is quite uncomfortable using a single file, why not allow multiple files as in the OpenAPI spec?

rafaelcaricio commented 7 years ago

@advance512 it is not a matter of allowing or not. We don't mind to accept a PR that solves this issue. But so far, there is none.

dwlocks commented 6 years ago

I've done quite a bit of research on this problem. There are 2 compounding issues:

  1. yelp/swagger-spec-validator converts yaml to json before validation so that they can use Julian/jsonschema.
  2. Julian/jsonschema currently cannot handle file relative paths such as

    { $ref : somefile.json#/some_id }

    Before swagger-spec-validator started using jsonschema, it did handle file relative paths, but regressed.

My team worked around the first problem by converting our yaml to json during our build process (including twiddling filenames within the data).

We're also working on a patch to Julian/jsonschema to fix the 2nd problem. It may require also submitting a patch to swagger-spec-validator.

Personally, I'd put all work on this set of problems off until the bug in Julian/jsonschema is fixed. After that, it's possible that connexion could use the yaml loading script we've written. However, I really think that yelp/swagger-spec-validator should be handling the yaml-json conversion.

(seeing all the referencing bugs, it's clear there's more than one way to fix this...)

hjacobs commented 6 years ago

@dwlocks thanks for the insights!

advance512 commented 6 years ago

So, I guess that for now, you'd combine the files in a build process - right? Or are there any better alternatives?

dwlocks commented 6 years ago

I think combining probably the best of not so great options. If you're somewhat careful with the YAML, simple concatenation should work. I personally would save the $refs so that they work in the concatenated state. IE are all document relative "#/" style.

One of the bugs here I think suggested a templating with jinja2 thing.

(I had a few days to go at this during work, but no longer. I'm working on the jsonschema bits in my spare time at home now.)

dradetsky commented 6 years ago

@dwlocks I came here after foolishly replicating some of your research. I haven't yet looked into the swagger-spec-validator, but i was considering the following idea:

If I modify resolve_remote to treat scheme == '' as "this is a local relative file lookup" (and thus I don't have to give it 'file://' + abspath), then I can add a branch which does a regular open and tries to read yaml first if the extension is correct, then converts it to json.

I think this is all I would need. Let me know if you think it would not be sufficient.

I don't see this getting merged into the primary jsonschema branch anytime soon (would no doubt require handling all branches of that method correctly), but it would at least be something connexion users could use as well (by way of installing a specific branch).

dradetsky commented 6 years ago

Actually, looking here, it looks like the real solution is to override the RefResolver used by connexion, or possibly by swagger-spec-validator (don't yet know which of those is where it's literally invoked), so that it does sets base_uri correctly and reads yaml.

holdenweb commented 6 years ago

Would it be possible to continue to use a single swagger spec but integrate a tool that allows each endpoint or group of endpoints to be described separately and integrated to create the necessary connexion swagger file?

I've looked at swagger-aggregator and spec-synthase, the later of which looks the more hopeful. I may report back if I make any progress. At present the swagger spec for our API server is over 2,700 lines long, and I want to be able to refactor it so the Swagger specification can live in a directory tree with a parallel structure to the code.

advance512 commented 6 years ago

@holdenweb Please do report back, interesting.

allan-silva commented 6 years ago

@holdenweb, I realized today a citation to my package Spec-Synthase, sorry by not have more documentation. If you want to build the swagger.yml from command line, take a look at: http://spec-synthase.readthedocs.io/en/latest/usage.html

If you want to create spec in runtime and use it with connexion, you can see: https://github.com/MicroarrayTecnologia/spec-synthase/blob/master/tests/test_specsynthase.py#L20

We are using Spec-Synthase in production on Mozilla Application Update Service (mozilla/balrog), our scenario is share peaces of spec between a public and a admin API: https://github.com/mozilla/balrog/blob/df92ab8ae523f7aedea7ed00b61b49a56dc56f0d/auslib/web/public/base.py#L30-L37

https://github.com/mozilla/balrog/blob/df92ab8ae523f7aedea7ed00b61b49a56dc56f0d/auslib/web/admin/base.py#L16-L29

Seems that spec-synthase fits well for your scenario, once you can have n-files describing paths, n-files describing responses, etc.

dtkav commented 5 years ago

Another tool for resolving local references : https://github.com/wework/speccy

dtkav commented 5 years ago

I am starting to think that this behavior is best left to a tool outside of connexion (like wework/speccy). Otherwise we have to sort out the problems in #798 related to serving up all of the references, and making sure they are accessible to swagger-ui. Thoughts?

mattspring commented 5 years ago

@dtkav I just stumbled across this issue after running into it when trying to load an openapi spec with relative references. Use cases: multiple services that accept/return objects with the same model; authorization format; big regexs, etc.

Solutions involving the file:/ scheme or Jinja2 templating are a nonstarter (for me) because the definition files are no longer valid OpenAPI and I sacrifice portability.

Solutions involving speccy or other combination tools are kind of a bummer because it adds complexity to the build process.

TL;DR as a user, I would really like to see connexion support relative paths natively.

roman-telepathy-ai commented 5 years ago

Just adding my 2 cents...

I do split api's into different yaml files, so that I have various small API's. If I would like to reuse, I simple merge them on the fly. Each micro-API should have its own namespace.

holdenweb commented 5 years ago

specsynthase is good at merging specifications, detecting duplicated keys and the like and capable of validating the specifications it works with.

I've personally gone for a rather lightweight approach, where a package that implements a set of endpoints is defined by a paths.json file in the package's top-level directory. The definitions section is currently shared by all endpoints, as some definitions are common to many endpoints, but this isn't an absolute requirement of the design.

The endpoint set is defined as an extended flask.Blueprint and can be mounted at any point in the server's address space (which is what determines the path it gets associated with in the specification).

roman-telepathy-ai commented 5 years ago

@holdenweb yep, more or less I followed same approach.

reece commented 5 years ago

It appears that openapi-spec-validator now makes it possible to pass the spec_url, from which the base URI can be populated and relative refs generated.

See https://github.com/p1c2u/openapi-spec-validator/issues/3. The magic is to pass spec_url, like so: validate_spec(spec_dict, spec_url='file:///path/to/spec/openapi.yaml') (from README).

Although that change is old (Oct 2017), it was made after this thread started.

Like others, I'd love to see support for relative refs so that I can modularize schemas. The speccy and specsynthase routes seem like unfortunately workarounds.

Also, it's worth pointing out that the upcoming jsonschema draft (draft 8, I think) has language that says that rel refs should be resolved per RFC3986 (see section 4.2).

HRogge commented 5 years ago

I had the same problem (no working relative file URLs), but based on the comment from @reece , I got it working with a patch of spec.py and json_schema.py. HRogge@d173b3d12487de33a686c541ee90c25541a43c34

I use references to definitions inside JSON files, not to the JSON files itself ($ref: 'networkroutes.json#/definitions/Routes')

dtkav commented 5 years ago

@HRogge based on your patch I would not expect swagger-ui to work. Do you use that feature?

HRogge commented 5 years ago

You are right (just tested it), the opening pages looks right but I cannot expand the items... the patch is still a work-in-progress, at least I don't get Python exceptions anymore.

HRogge commented 5 years ago

I got a little bit further HRogge/connexion@72d278ea0be45c36c19f0a3dae99c8127866bd7b by storing all loaded external references and providing them in the Flask API code.

But I still get an error with swagger-ui: Resolver error at paths./domain.get.responses.200.content.application/json.schema.$ref Could not resolve reference: Could not resolve pointer: /definitions/Domains does not exist in document

The Chrome developer console says "actions.js:145 debResolveSubtrees: don't have a system to operate on, aborting."

Any pointers how to continue with this?

HRogge commented 5 years ago

I "resolved" the problem for me by writing a small tool that can combine the YAML openapi3 definition and the referred JSON documents into a single large document (things like JsonRef did not work). I still hope connexion and the swagger-ui will get the capability to handle split definitions by default in the future. If there is still code missing on the connexion side of the problem I would be willing to look into it (my attempt to understand what the swagger-ui was doing was not very successful).

vumaasha commented 5 years ago

I ended up using openapi-generator for generating the consolidated yaml where the references are resolved in to a single yaml. Then I use this yaml with connexion. Works like a charm

adamf commented 5 years ago

Another vote in support of supporting $ref properly, since that's what the OpenAPI spec calls for. I can't use the jinja solution as I'm using my OpenAPI definitions with Google Cloud Endpoints (but can and will use a launcher script to concat all the files together, for now). In particular, even with microservices, the definitions section will often be shared by services, so it'd be nice to have each service reference a common definitions file.

(also this project is great, thanks for the work on it!)

HRogge commented 5 years ago

I just became the victim of the OpenAPI3 vs. JSON-Schema differences... sighwho the hell got the idea to use JSON-Schema in OpenAPI but not exactly JSON-Schema?

At the moment I am fighting with a more complicated schema that I would load in my patched connexion.

HRogge commented 5 years ago

Important Github Issue for this one: https://github.com/p1c2u/openapi-core/issues/90

I have at least resolved the "schema validation" issue by modifying the meta-schema used by OpenAPI3... but this is definitely a hack. I still have to test if Connexion can still validate the input and output of my rest service.

openapi3_schema.diff.txt

HRogge commented 5 years ago

I added a little bit more code to the resolve_refs() function in json_schema to recursively update all references... see 20574738c0454d82ca9d5ecea5fb06ae9f957fc1

this, together with the modified openapi3 schema mentioned above allowed me to use a Connexion based rest service with references (3 layers deep) in the openapi definition.

openapi => json-schema-1 => json-schema-2

HRogge commented 5 years ago

I just got a version of Connexion (see https://github.com/HRogge/connexion/tree/relative_file_references ) running with an integrated (patched) openapi_spec_validator... which is both processing the YAML/JSON schemata with relative references and has a working REST output validation (at least it looks like it).

erans commented 5 years ago

Any ETA of getting this merged and release soon?

advance512 commented 5 years ago

@HRogge that's awesome!

codepossible commented 5 years ago

Any ETA of getting this merged and release soon?

Impacted by the same issue, eagerly awaiting the fix.

tuankiet65 commented 5 years ago

For anyone looking for a solution, you can use prance to merge separate OpenAPI files into one before feeding it into connexion. Since prance is written in Python, you can use it directly during the app startup, instead of having to run external tools like speccy or swagger-cli. For example:

import connexion
import prance
from typing import Any, Dict
from pathlib import Path

app = connexion.App(__name__)

def get_bundled_specs(main_file: Path) -> Dict[str, Any]:
    parser = prance.ResolvingParser(str(main_file.absolute()),
                                    lazy = True, strict = True)
    parser.parse()
    return parser.specification

app.add_api(get_bundled_specs(Path("openapi/main.yaml")),
            resolver = connexion.RestyResolver("cms_rest_api"))
app.run(server = "tornado", host = "0.0.0.0", port = 5000)

get_bundled_specs() resolves the file specified in the argument and returns a resolved OpenAPI dict which then can be passed to connexion.App.add_api()

codepossible commented 5 years ago

Thank you @tuankiet65. That worked beautifully (well almost).

Is there an improvement in connexxion/prance where in Swagger UI instead of referring to the models as Inline Model nnn, they use the actual name. The generated swagger file (virtual swagger.json) does not seem to have those names like that.

deunz commented 5 years ago

@tuankiet65 Very good stuff. It's perfect for me without the strict=True :)

cigano commented 5 years ago

Used @tuankiet65's solution with a few tweaks:

def get_bundled_specs(main_file: Path) -> Dict[str, Any]:
    parser = prance.ResolvingParser(str(main_file.absolute()),
                                    lazy = True, backend = 'openapi-spec-validator')
    parser.parse()
    return parser.specification
tomghyselinck commented 4 years ago

Unfortunately the solution using prance does not work for us since we use recursive schema's.

I.e. something like the following won't work:

    RecursiveItem:
      title: Tree of items
      type: object
      properties:
        name:
          type: string
        children:
          $ref: "#/components/schemas/RecursiveItemList"
      additionalProperties: false
    RecursiveItemList:
      title: collection of recursive items
      type: array
      items:
        $ref: "#/components/schemas/RecursiveItem"