open-api-spex / open_api_spex

Open API Specifications for Elixir Plug applications
Mozilla Public License 2.0
723 stars 187 forks source link

Move schema definition to inside functions #382

Open josevalim opened 3 years ago

josevalim commented 3 years ago

Hi! πŸ‘‹

First off, thank you for the extremely useful project. :)

I would like to start a discussion about moving the schema definitions inside a schema function. In one project, we have debugged the schemas to be the ones responsible for high-compilation times and large recompilation graphs. A proposal would be to move the schema from:

  defmodule User do
    require OpenApiSpex

    OpenApiSpex.schema(%{
      title: "User",
      ...
    })
  end

to:

  defmodule User do
    # or def open_api_schema do
    def schema do 
      # you can remove __ENV__ if you keep it as a macro
      OpenApiSpex.schema(__ENV__, %{
        title: "User",
        ...
      })
    end
  end

The benefit of this approach are:

  1. Fast compilation times
  2. If a schema references another schema, it is no longer a compile-time dependency
  3. Refactoring, splitting code into functions, etc is now straightforward

The downside of this approach is that, if if the schema needs to do a lot of work to precompute values, operations, validations, etc, now this work is no longer possible. However, that may not be an issue at the end of the day. For example, imagine you need you want to build a "Compiled" schema with all information about all other schemas, you could do this:

defmodule AllSchemas do
  use OpenApiSpex.CompiledSchema,
    schemas: [User, Foo, Bar, ...]
end

The difference in this approach is that, now we have a single compile-time schema, which is easy to identify and manage. Plus, it may be that this schema is not necessary at all. For example, it may not be necessary for generating docs, as you can traverse all of the schemas as you generate docs.

Another place where you may need to keep the compile-time behaviour is in the operation macro in controllers. Once again, it depends on how much work is done building the validations at compile time. If it is all runtime based, even better. The point is that, by moving the schema to a function, you will have the option to limit the compile-time behaviour to specific constructs, instead of the whole schema.


PS: One option would be to change to the schema/1 macro today to simply define a function with the schema:

defmacro schema(schema) do
  quote do
    def schema do
      OpenApiSpex.schema(__ENV__, unquote(schema))
    end
  end
end

I don't like this approach because it changes the semantics of the code. For example, a schema like this would no longer work:

  defmodule User do
    require OpenApiSpex

    title = "User"
    OpenApiSpex.schema(%{
      title: title,
      ...
    })
  end

Because the macro moves the schema to a function where the title variable is no longer available.

mbuhot commented 3 years ago

Hi @josevalim thanks for opening the discussion πŸ‘‹

I would like to start a discussion about moving the schema definitions inside a schema function.

The good news is that it already works this way fundamentally: https://github.com/open-api-spex/open_api_spex/blob/master/lib/open_api_spex/schema.ex#L143-L147

Any module that exposes a schema/0 function returning a %Schema{} struct can be used. Any additional functionality added by the schema/1 macro should be considered a convenience.

Another place where you may need to keep the compile-time behaviour is in the operation macro in controllers

Similarly to the schemas, the operation macros shouldn't be doing too much compile time work other than de-sugaring the DSL to a function definition, which shouldn't have compile time dependencies on the schemas, it should only be using their module names.

In general when an Operation or Schema needs to reference another schema, it can be done using the name of the module containing the schema, and resolved to a %Schema{} struct at runtime using OpenApiSpex.resolve_schema_modules/1 (https://github.com/open-api-spex/open_api_spex/blob/master/lib/open_api_spex.ex#L20-L37)

Are you able to share an example that demonstrates the problem with compilation times?

josevalim commented 3 years ago

The good news is that it already works this way fundamentally: https://github.com/open-api-spex/open_api_spex/blob/master/lib/open_api_spex/schema.ex#L143-L147

That's amazing!!!

In general when an Operation or Schema needs to reference another schema, it can be done using the name of the module containing the schema, and resolved to a %Schema{} struct at runtime using OpenApiSpex.resolve_schema_modules/1 (https://github.com/open-api-spex/open_api_spex/blob/master/lib/open_api_spex.ex#L20-L37)

Unfortunately, just referencing a module is enough to add a compile time dependency, because Elixir doesn't ultimately know if you are going to invoke something in that module. For this reason, even doing something like this:

    OpenApiSpex.schema(%{
      title: "UserResponse",
      description: "Response schema for single user",
      type: :object,
      properties: %{
        data: User
      },

which you can see in the README, is going to add a compile-time dependency to User. Similar for operations. If you reference any module, even if just by name, those are compile time dependencies.

However, if you don't do any work at compile time, then that's good news, because it means fixing it is rather trivial. What do you think about this:

  1. For schemas, what if we change the documentation and README to push people towards def schema?

  2. For operations, if the map you receive is not touched at compile-time at all, you could do the same that we do Plug, which is to expand all aliases, removing the compile-time references:

https://github.com/phoenixframework/phoenix/blob/3a9ae5c6bfed972547c66f11788080ee46fd9c5b/lib/phoenix/router.ex#L716

In Plug/Phoenix, it works like this. If you do plug MyModule, foo: Bar, it will call MyModule.init(foo: Bar) at compile time, which implies a compile-time dependency on both MyModule and Bar. However, we added a plug_init_mode, that controls if you want to initialize the plugs at runtime or compile time, and we set it to runtime in dev and test. When set at runtime, we don't initialize the plug at compile time anymore. So if we see an alias given to the Plug, we can expand it in the macro, so it doesn't become a compile time reference.


Quick question, in the README we have this:

  operation :update,
    summary: "Update user",
    ...,
    responses: %{
      201 => {"User", "application/json", UserResponse}
      422 => OpenApiSpex.JsonErrorResponse.response()
    }

the expr OpenApiSpex.JsonErrorResponse.response() is evaluated at compilation-time and it returns a schema that is embedded at compile time or it is transformed and then expanded at runtime? The former can be problematic because it means the same schema gets repeated and expanded in multiple places (plus the compile-time dependency). You may want to support references like {OpenApiSpex.JsonErrorResponse, :response} instead.

Thanks for the prompt reply! I will see if I find some schemas to share. :)

JesseHerrick commented 3 years ago

Hi, all! :wave: Thank you, @josevalim, for opening the issue and thank you, @mbuhot, for taking a look!

I wanted to share some examples of schema structures we have on a project with slow OpenApiSpex compile times. This code is modified to remove company-sensitive implementation details but I hope it's still useful for debugging this issue. Let me know if I can provide more context.

Here is an example of an endpoint we have documented.

  defmodule EndpointExampleParams do
    @moduledoc false
    require OpenApiSpex

    OpenApiSpex.schema(%{
      type: :object,
      properties: %{
        name: %Schema{type: :string, description: "..."},
        slug: Generic.Slug,
        date: %Schema{
          type: :string,
          format: :date,
          description: "..."
        },
        value1: Generic.DateRange,
        value2: Generic.DateRange,
        type: %Schema{type: :string, enum: [:one, :two, :three]}
      },
      required: [:slug],
      example: %{
        name: "...",
        slug: Schema.example(Generic.Slug.schema()),
        date: "2021-01-01",
        value1: Schema.example(Generic.DateRange.schema()),
        value2: Schema.example(Generic.DateRange.schema()),
        type: "three"
      }
    })
  end

and another

  defmodule Endpoint do
    @moduledoc false
    require OpenApiSpex

    OpenApiSpex.schema(%{
      description: "...",
      type: :object,
      properties: %{
        slug: Generic.Slug,
        status: Enums.as_schema(Project.Enums.Status, description: "..."),
        currency: Currency,
        property: PropertyContext.Property,
        endpoint_example: EndpointExample,
        datetime: %Schema{
          type: :string,
          format: :date,
          nullable: true,
          description: "..."
        },
        message: %Schema{type: :string, nullable: true, description: "..."}
      },
      required: [
        :slug,
        :status,
        :currency
      ],
      example: %{
        slug: Schema.example(Generic.Slug.schema()),
        currency: Schema.example(Currency.schema()),
        property: Schema.example(PropertyContext.Property.schema()),
        endpoint_example: Schema.example(EndpointExample.schema()),
        datetime: nil,
        message: nil
      }
    })
  end

And here's an example operation call.

  operation(:create,
    summary: "...",
    request_body: {"...", "application/json", EndpointExampleParams},
    responses: [
      created: {"Success", "application/json", EndpointExampleResponse},
      unprocessable_entity: {"Validation error", "application/json", ValidationError}
    ]
  )

Let me know how I can help! Me and several others at Remote are very interested in contributing to help improve compile times in OpenApiSpex.

josevalim commented 3 years ago

@JesseHerrick if Schema.example(Generic.Slug.schema()) is literally calling the other schema at compile-time and embedding it, and you are doing that a lot, when I can totally see it being the cause of long compile times. If that's indeed the case, supporting Schema.example(Generic.Slug) and Schema.example({Generic.Slug, :schema}) - if they are not yet supported - can be a huge help (and perhaps place warns to push people to avoid doing that).

mbuhot commented 3 years ago

Thanks @JesseHerrick πŸ‘

I think we can adopt @josevalim's suggestion to expand aliases appearing in the schema. The examples are a bit problematic, since the call to Schema.example(Generic.Slug) will be evaluated at compile time.

We could:

  1. Provide an optimisation where we detect that a map literal is given to OpenApiSpex.schema containing an example key, remove the example from the map and evaluate it at runtime inside the generated schema/0 function.

  2. Provide a syntax for generating examples from schemas during schema resolution, eg:

      example: %{
        slug: {:example_of, Generic.Slug},
        currency: {:example_of, Currency},
        property: {:example_of, PropertyContext.Property},
        endpoint_example: {:example_of, EndpointExample},
        datetime: nil,
        message: nil
      }

There are some edge cases where compile-time dependencies are necessary. One case is when an object type schema uses allOf, eg:

defmodule Cat do
  OpenApiSpex.schema(%{
    title: "Cat",
    allOf: [Animal, Meow]
  })
end

To define the %Cat{} struct for that schema, the properties of Animal and Meow will need to be resolved at compile-time.

josevalim commented 3 years ago

@mbuhot great! It seems we are all on the same page then!

For the map literal optimization to be concrete, I would expand your action with two additional items:

  1. Add the quoted_literal? check and optimization

  2. All function calls that can be resolved at runtime in schemas must be done with data structures. So, as you propose, use {:example_of, GenericPlug} instead of Schema.example(Generic.Slug.schema()). This is because we want to keep them as data structures and as literals

  3. Provide functions for everything that needs to be resolved at compile time. So allOf can become: allOf: OpenApiSpex.allOf([Animal, Meow]). This is to signal to users that those are compile time behaviours.

  4. If possible, never have the user call .schema() on the modules themselves. Make Schema the shortcut for Schema.schema() and {Schema, :foo} the shortcut for Schema.foo()

WDYT?

mbuhot commented 3 years ago

Sounds good to me @josevalim πŸ‘

jozuas commented 1 year ago

Sorry to bump the issue, but has this been resolved or are there ways in which projects could avoid the compile time deps?