Closed rjwills28 closed 1 year ago
I am happy to collaborate on this and put in a PR with a suggested fix if you think applicable.
I haven't investigated it yet, but I'm totally in favour of replacing the class with a dictionary, maybe we can use a TypedDict instead of dataclass (we wouldn't need to change much code with that)
Thanks for the great write up! I'll be happy to review and test the PR.
Do you we could also get a benchmark for https://github.com/strawberry-graphql/benchmarks/blob/main/benchmarks/benchmarks.py ?
It would be neat to track this there too!
Thanks for the quick reply!
Using a TypedDict
was a good idea and I did explore this but unfortunately it did cause some other issues:
send_json()
method in handlers.py requires an input of type dict
, which a TypedDict
is not compatible with. This mean having to do some slightly ugly casting from the TypedDict
to a dict
.TypedDict
cannot have any default values. NextMessage()
sets type = 'next'
by default so that none of the instantiations need to include this. You can't do this with a TypedDict
meaning that every creation to NextMessage()
would have to add the type='next'
argument. The NextMessage()
is used by many of the tests so they would all need updating. It seems a bad idea to have this as an input as the type
variable needs to be 'next' in all cases.NextMessage()
is used in many of the tests which in the current implementation call as_dict()
on it. Making NextMessage()
a TypedDict
would mean having to remove the as_dict()
call from all tests.TypedDict
object on line 261 of handlers.py means that the NextMessage()
object is only used by the tests which seems unnecessary. Completely removing it would mean having to do something different in all of the tests (like manually creating the dictionary). Plus we still have the problem of converting the TypedDict
to a dict
.dict
object to get around the casting issue means we have no type checkingTherefore I settled on an implementation shown in PR #2481, which minimises the changes required and fixes the issue. The NextMessage()
is created as normal and submitted to the .send_message()
function. Within the NextMessage()
class we override the base class .as_dict()
method where we just create a dictionary from the class variables and return this. We can do this as we know exactly what keys we contain in the class. This means that we don't do the deepcopy (in the base class) for every message and we maintain compatibility with the rest of the code and tests.
Please let me know your thoughts and any feedback on the PR.
I will take a look at getting something added to the benchmark but not sure how that works quite yet.
The changes proposed here are basically the reverse of #1669. I don't mind using TypeDicts instead of dataclasses if it increases performance. The main motivation to use either of them is typing. The graphql-transport-ws implementation uses dataclasses for the reasons described in #1169. These benefits probably don't outweight the performance costs but maybe keep them in mind :)
@rjwills28 have you also tried using the slots argument in the dataclass? it is 3.10 only, but could be another way to make it better?
Thanks both for the feedback.
I don't think the change in my current PR goes against the suggestions made in #1669 as I have kept the NextMessage()
as a dataclass and have simply overridden the as_dict()
method so that it does not perform a deepcopy of every message (see change). The deepcopy poses a particular problem for the NextMessage()
class as it it used ubiquitously.
I have also had a look at using the slots argument in the dataclass (i.e. @dataclass(slots=True)
) but I'm not sure this helps us much. Without any other change but the slots argument the CPU usage is still higher than the legacy websocket tests as the deepcopy is still performed in the base GraphQLTransportMessage
class. I also tested adding the slots
argument to my current PR and did not see a noticeable difference there.
Describe the Bug
We have a Strawberry GraphQL server that we have been stress testing and running CPU performance tests on. We have found that there is a noticeable and consistent increase in the CPU usage of our server application when our client subscribes using the graphql-transport-ws protocol compared to using the graphql-ws protocol.
I have done a bit of investigating and further profiling using py-spy and discovered that the Strawberry code is creating a
NextMessage
object (here) for each message, which it then converts to a dictionary (here) using thedataclasses
asdict()
method (here). Some internet research shows that thisasdict()
method is doing adeepcopy
of everything within the class. I ran a few timing tests and theasdict()
method takes an order of magnitude longer than doing a simple.__dict__
on the object. This is only done in the graphql-transport-ws implementation and not the graphql-ws implementation which explains why there is a difference in CPU usage between the 2 protocols.I do not believe that we need to be doing a deepcopy when turning the class into a dictionary. What's more, I wonder whether we need to even be creating the
NextMessage
object because as far as I can see, we create it and pass it to a function that immediately turns it into a dictionary. So why don't we just create it as a dictionary and send it instead. This would bypass having to do any sort of conversion costing time.I.e. instead of line 261 and 262 (here) which do:
we could do something like:
When I ran the performance tests with the above change the CPU usage dropped and was consistent with the graphql-ws protocol performance.
System Information
Additional Context
I have created a simple demo Strawberry GraphQL server and Python client on GitHub, available at: https://github.com/rjwills28/strawberry_cpu_demo/tree/master. Instructions on how to install and run are in the readme. It simulates the tests that we were running where we have a server providing subscription updates at 10Hz and a client that creates 100 different subscriptions. Follow the example in the readme to first run with the graphql-ws protocol (command line argument (
-p 1
) and then with the graphql-transport-ws protocol (-p 2
). Run both a few times and you should see that the average CPU usage is on the whole higher for the latter protocol. Please let me know if you have any problems running this.Upvote & Fund