strawberry-graphql / strawberry

A GraphQL library for Python that leverages type annotations 🍓
https://strawberry.rocks
MIT License
3.98k stars 531 forks source link

Increased CPU usage when subscribing with the graphql-transport-ws protocol #2479

Closed rjwills28 closed 1 year ago

rjwills28 commented 1 year ago

Describe the Bug

We have a Strawberry GraphQL server that we have been stress testing and running CPU performance tests on. We have found that there is a noticeable and consistent increase in the CPU usage of our server application when our client subscribes using the graphql-transport-ws protocol compared to using the graphql-ws protocol.

I have done a bit of investigating and further profiling using py-spy and discovered that the Strawberry code is creating a NextMessage object (here) for each message, which it then converts to a dictionary (here) using the dataclasses asdict()method (here). Some internet research shows that this asdict() method is doing a deepcopy of everything within the class. I ran a few timing tests and the asdict() method takes an order of magnitude longer than doing a simple .__dict__ on the object. This is only done in the graphql-transport-ws implementation and not the graphql-ws implementation which explains why there is a difference in CPU usage between the 2 protocols.

I do not believe that we need to be doing a deepcopy when turning the class into a dictionary. What's more, I wonder whether we need to even be creating the NextMessage object because as far as I can see, we create it and pass it to a function that immediately turns it into a dictionary. So why don't we just create it as a dictionary and send it instead. This would bypass having to do any sort of conversion costing time.

I.e. instead of line 261 and 262 (here) which do:

    next_message = NextMessage(id=operation_id, payload=next_payload)
    await self.send_message(next_message)`

we could do something like:

    next_message = {"id":operation_id, "payload": next_payload, "type": "next"}
    await self.send_json(next_message)

When I ran the performance tests with the above change the CPU usage dropped and was consistent with the graphql-ws protocol performance.

System Information

Additional Context

I have created a simple demo Strawberry GraphQL server and Python client on GitHub, available at: https://github.com/rjwills28/strawberry_cpu_demo/tree/master. Instructions on how to install and run are in the readme. It simulates the tests that we were running where we have a server providing subscription updates at 10Hz and a client that creates 100 different subscriptions. Follow the example in the readme to first run with the graphql-ws protocol (command line argument (-p 1) and then with the graphql-transport-ws protocol (-p 2). Run both a few times and you should see that the average CPU usage is on the whole higher for the latter protocol. Please let me know if you have any problems running this.

Upvote & Fund

Fund with Polar

rjwills28 commented 1 year ago

I am happy to collaborate on this and put in a PR with a suggested fix if you think applicable.

patrick91 commented 1 year ago

I haven't investigated it yet, but I'm totally in favour of replacing the class with a dictionary, maybe we can use a TypedDict instead of dataclass (we wouldn't need to change much code with that)

Thanks for the great write up! I'll be happy to review and test the PR.

Do you we could also get a benchmark for https://github.com/strawberry-graphql/benchmarks/blob/main/benchmarks/benchmarks.py ?

It would be neat to track this there too!

rjwills28 commented 1 year ago

Thanks for the quick reply!

Using a TypedDict was a good idea and I did explore this but unfortunately it did cause some other issues:

Therefore I settled on an implementation shown in PR #2481, which minimises the changes required and fixes the issue. The NextMessage() is created as normal and submitted to the .send_message() function. Within the NextMessage() class we override the base class .as_dict() method where we just create a dictionary from the class variables and return this. We can do this as we know exactly what keys we contain in the class. This means that we don't do the deepcopy (in the base class) for every message and we maintain compatibility with the rest of the code and tests.

Please let me know your thoughts and any feedback on the PR.

rjwills28 commented 1 year ago

I will take a look at getting something added to the benchmark but not sure how that works quite yet.

DoctorJohn commented 1 year ago

The changes proposed here are basically the reverse of #1669. I don't mind using TypeDicts instead of dataclasses if it increases performance. The main motivation to use either of them is typing. The graphql-transport-ws implementation uses dataclasses for the reasons described in #1169. These benefits probably don't outweight the performance costs but maybe keep them in mind :)

patrick91 commented 1 year ago

@rjwills28 have you also tried using the slots argument in the dataclass? it is 3.10 only, but could be another way to make it better?

rjwills28 commented 1 year ago

Thanks both for the feedback.

I don't think the change in my current PR goes against the suggestions made in #1669 as I have kept the NextMessage() as a dataclass and have simply overridden the as_dict() method so that it does not perform a deepcopy of every message (see change). The deepcopy poses a particular problem for the NextMessage() class as it it used ubiquitously.

I have also had a look at using the slots argument in the dataclass (i.e. @dataclass(slots=True)) but I'm not sure this helps us much. Without any other change but the slots argument the CPU usage is still higher than the legacy websocket tests as the deepcopy is still performed in the base GraphQLTransportMessage class. I also tested adding the slots argument to my current PR and did not see a noticeable difference there.