zmievsa / cadwyn

Production-ready community-driven modern Stripe-like API versioning in FastAPI
https://docs.cadwyn.dev/
MIT License
186 stars 24 forks source link

Optimize the calls to enrich_swagger #164

Closed zmievsa closed 1 month ago

zmievsa commented 4 months ago

enrich_swagger is called every time an endpoint is added to Cadwyn app. The problem is that enrich_swagger does not just add this endpoint: it rebuilds the entire swagger, likely slowing down our tests and startup of any Cadwyn app.

See if there is anything we can do about it. But note that our openapi tests are extremely lacking so any change there is going to be somewhat dangerous. Hence this issue also requires adding a good number of unit tests that test openapi.json generation.

devla commented 2 months ago

From my point of view

Solution 1:

For versioned routers, calling the enrich_swagger() method from add_header_versioned_routers() is not optimal, because this method is called through a loop from generate_and_include_versioned_routers() for each router. A slightly better solution would be to move the enrich_swagger() method at the end of generate_and_include_versioned_routers() method (out of the loop). But, on the other hand, if we want to use add_header_versioned_routers() just like include_router(), add_api_route(), etc. then a call to enrich_swagger() would probably be the desired action.

Solution 2: What do you think about excluding enrich_swagger() from these methods and call it at the end directly? In this way all routes will be loaded through one call. Otherwise, the best I managed was three calls, with implemented solution 1.

Solution 3: enrich_swagger() method can be optimized for versioned routes, if we add "if header_value_str not in self.swaggers" condition, but not sure what to do with unversoned routers.

@zmievsa what do you think?

My thoughts go in the direction of 2, to somehow aggregate routes and create them all at the end. Startup performance really depends on which approach we choose, if we repeatedly call add_header_versioned_routers(), add_unversioned_routers() or include_router() for each router individually instead of passing a list (tuple) of routers, startup will be slower.

How much such a solution would be in accordance with FastAPI and APIRouter logic is also a question.

devla commented 2 months ago

Also when think about refactoring, separating the enrich_swagger() method for unversioned and versioned routers can also be an option, because for each versioned router, the logic for unversioned routers is also executed.

zmievsa commented 2 months ago

I believe that any complication to the interface is likely not acceptable. I.e. long startup seems to be a lesser evil compared to complicating the interface because Cadwyn is already fairly complex.

Any other path seems to equally make sense to me as long as we can verify that it brings significant benefits to a standard app (which should call generate_routers once but can include 5-10 routers before that)

devla commented 2 months ago

Ohh, no, no, sorry for the previous ramblings... this can be solved by registering the startup event through Cadwyn.init(), under self.router:... you just need to add self.add_event_handler("startup", self.enrich_swagger). This way, enrich_swagger() method will be called at the end of lifespan, after all the routes are included, and before the application startup. This does not break interfaces and tests, and self.enrich_swagger() can be cleaned up from everywhere in the code, right?

@zmievsa if you agree with the idea, I can provide a patch/pull request.

zmievsa commented 2 months ago

This sounds like a pretty amazing idea, actually. Yes, let's implement this.

zmievsa commented 1 month ago

@devla sadly, I had to revert these changes as they broke swagger when using lifespan. We could add another hack and combine lifespans but I'm thinking that maybe eliminating enrich_swagger altogether would be a better approach.

devla commented 1 month ago

@zmievsa do you want me to work on it? Just explain me briefly or open an issue? I pointed out for lifespan behavior while I was working on tests, but yes, I didn't think about the fact that if someone already uses a custom lifespan, then it's a brake change.

zmievsa commented 1 month ago

Yup, I remember the lifespan problem on the tests as well but overlooked it too.

I guess the best approach would be to generate openapi dynamically, similar to how fastapi does it.

But we just need to be careful not to place a large performance burden on the openapi and docs endpoints.

Thank you for being ready to continue working on this! Yes, let's try doing enrich_swagger dynamically and see how well it works. If you have time for this, it would be perfect.

devla commented 1 month ago

Yes, let's try again. I have time... and I think I have an idea...

zmievsa commented 1 month ago

It looks like another issue is blocked by this one so it's now high priority. Do you need any help with this one? I think, we can even fix this today.

I think, I also know what to do now — enrich_swagger seems to be an old and mostly useless premature optimization which migrated to us from verselect so now I feel quite comfortable just generating it every time within the docs, openapi, and redoc endpoints. The performance hit shouldn't be substantial compared to regular fastapi apps.

devla commented 1 month ago

@zmievsa please go ahead with this, some work has jumped in and I've slowed down. Your idea sounds good!

zmievsa commented 1 month ago

Done by removing enrich_swagger. @devla thank you for your help!

devla commented 1 month ago

@zmievsa My idea was to implement a custom include_router() method on the Cadwyn class, and to make enrich_swagger deprecated as well as generate_and_include_versioned_routers() and add_unversioned_routers(), so that there is only include_router() based on:

def include_router(
        self,
        router: APIRouter,
        *,
        version: Optional[str] = None,
        **kwargs: Dict[str, Any],
    ) -> None:
        """
        Overrides the original include_router method.

        Args:
            router (APIRouter): The router to include.
            version (str, optional): The version header value. Defaults to None.
            **kwargs: Additional arguments for the include_router method.

        Returns:
            None
        """

        if version is not None:
            validate_version(version)

            kwargs.setdefault("dependencies", []).append(
                Depends(
                    _version_dependency(self.router.version_header_name, version),
                ),
            )

            super().include_router(router, **kwargs)

            added_routes: list = []
            added_routes.extend(self.routes[len(self.routes) - len(router.routes) :])

            for route in added_routes:
                self.router.versioned_routers.setdefault(
                    normalize_version(version),
                    [],
                ).append(route)
        else:
            super().include_router(router, include_in_schema=False, **kwargs)
            self.router.unversioned_routers.extend(router.routes)

        self.enrich_swagger()

version argument is not needed, code snippets related to generate_versioned_routers() can be merged here from generate_and_include_versioned_routers(), including most of the changes you implemented. In general, verselect was started in the wrong way and a lot can still be optimized, but step by step.

I tested your changes and it's good, works Okay! Cheers!

zmievsa commented 1 month ago

Agreed on all points! If you want to help with fixing some of the other verselect issues — feel free. We need all the help we can get 🔥

devla commented 1 month ago

I would like, because I was already working on verselect optimization until I realized that Cadwyn is actually the main project that makes sense. Here is my version of verselect project: https://github.com/devla/fastapi_version_handler (one week of wasted work ;)

I am currently going through the onboarding process for a new position, but i will be there. Header-based versioning is something that catches my attention, and Stripe-like API versioning is just something I need to understand better.

zmievsa commented 1 month ago

I'll add a warning to verselect regarding this. Thank you.