Open mwilliamson-healx opened 8 years ago
For those of you using Django, this is a must-use: https://github.com/tfoxy/graphene-django-optimizer
Performs pretty amazing automatic select_related and prefetch_related to avoid lazy loading resolved fields
@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.
Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287
I have a prototype that went from 0.8s to 0.3s by moving from graphene ObjectType to a dict and a simple fields filtering/renaming function.
@ktosiek hi, thx for your awesome idea to skip expensive complete_value
call.
But I cant understand moving from graphene ObjectType to a dict and a simple fields filtering/renaming function
.
What's the meaning of moving graphene.ObjectType
to a dict ?
It's very helpful if you give a simple example code.
@Miloas It's about using a custom function instead of Graphene resolvers. Something like:
def extract_fields(info, data):
"""Extract fields requested by the client from `data`"""
return OrderedDict(
(field.alias.value if field.alias else field.name.value,
data[field.name.value])
for field in info.field_asts[0].selection_set.selections)
This code is just an illustration, not a tested example. The general idea is that the client might request just a selection of fields, and might even request a field under a different name (in GraphQL: { nameTheClientLikes: actualFieldName }
) - we don't want to lose that, even for the "raw results" path.
You'd use this with the monkeypatch I've mentioned earlier like this:
def resolve_goodbye(root, info):
return RawGraphQLResult([extract_fields(info, elem) for elem in root.some_huge_list])
@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.
Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287
@ktosiek Just tested locally and found the same although not nearly to the degree that you did (v3 is ~25-50% slower).
However profiling this seems to indicate an unreal amount of recursive functions and type checking. To serialize 100k integers we're making 36 million isinstance calls, 2 million complete_value calls and 7 million isawaitable calls.
All in all, we're averaging nearly 1,000 function invocations per object. I'm admittedly not extremely familiar with the graphql architecture and design decisions, but this seems excessive.
(env3) ~/Code/graphene head -n 50 cprof.txt 53.549506894000004 99251786 function calls (93244536 primitive calls) in 52.909 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function) 36016108/36016098 8.114 0.000 12.218 0.000 {built-in method builtins.isinstance} 2000010/10 6.241 0.000 53.415 5.342 execute.py:701(complete_value) 7000070 6.081 0.000 15.722 0.000 inspect.py:221(isawaitable) 1000010/10 2.889 0.000 53.416 5.342 execute.py:570(resolve_field) 1000010/10 2.537 0.000 53.416 5.342 execute.py:416(execute_fields) 2000010/10 2.340 0.000 53.415 5.342 execute.py:640(complete_value_catching_error) 7000072 2.291 0.000 2.291 0.000 {built-in method _abc._abc_instancecheck} 10 1.915 0.192 53.415 5.342 execute.py:785(complete_list_value) 7000072 1.813 0.000 4.104 0.000 abc.py:137(instancecheck) 1000010 1.684 0.000 7.965 0.000 execute.py:607(resolve_field_value_or_error)
The example given in this issue's description is outdated (unless I'm missing something). @syrusakbary does it make sense to close this issue since it's misleading to newcomers (like myself)? A separate issue could be open about further performance improvements.
Context: The original issue says:
the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.
This is no longer true. Here's the setup code that, AFAICT, is the modern equivalent of the code from the issue's description:
import graphene
class User(object):
def __init__(self, id):
self.id = id
users = [User(index) for index in range(0, 10000)]
class UserQuery(graphene.ObjectType):
id = graphene.Int()
class Query(graphene.ObjectType):
users = graphene.List(UserQuery)
def resolve_users(root, info):
return users
schema = graphene.Schema(query=Query)
Now, this consistently gives ~300-400ms on my 2018 MBP, not 10s as the issue description suggests. Example:
In [28]: %timeit rv = schema.execute('{ users { id } }')
1 loop, best of 3: 336 ms per loop
Although the example is faster than when I originally opened the issue, it's still too slow for my use-case, and is around 10-15 times slower than the library we're currently using.
Yep, there's definitely plenty of room for improvement still. I just argue that this issue has become outdated, misleading, and quite messy overall :) Would be good to start afresh with a new issue mentioning the accurate status quo and what next steps can be taken.
I made following plug and play gist based on other gist above - https://gist.github.com/Anon731/0f58012ffb5be5e229e70dd44baa4258
Comparison 1.2ms vs 3.7ms.
I also make some tests and compare graphene-django
with django-rest-framework
.
In both cases i used a view with pagination (DjangoConnectionField
or PageNumberPagination
).
The tests requests 100 items from existing 1000 items.
graphene:
django v1.11.26
graphene v2.1.8
graphene-django v2.7.1
timeit... use 5 * 75 loop...
max...: 10.84 ms
median: 10.66 ms
min...: 10.61 ms
cProfile stats for one request: 30148 function calls (28688 primitive calls) in 0.019 seconds
Rest-API:
django v1.11.26
Rest-Framework v3.9.4
timeit... use 5 * 213 loop...
max...: 3.81 ms
median: 3.63 ms
min...: 3.58 ms
cProfile stats for one request: 14171 function calls (13142 primitive calls) in 0.007 seconds
What immediately stands out is the much higher number of function calls.
I also run a test with graphene v3 with django 2.2 with the graphene-django sources from: https://github.com/graphql-python/graphene-django/issues/812 It's ~30% slower than graphene v2 I hope this is slower because it's not the final code yet.
See also: https://github.com/graphql-python/graphene-django/issues/829
@jedie could you share the code you used to benchmark Graphene vs Rest Framework?
@jedie could you share the code you used to benchmark Graphene vs Rest Framework?
Sorry, can't share the code. But it's really a minimal example code.
But i made another tests and benchmark only with graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa
It fetches only a list of 1000 dummy items and takes ~20ms
Now, i also made a similar test with tartiflette.
To my surprise: tartiflette (~57ms) is significantly slower than graphql-core (~20ms).
My benchmark code:
tartiflette: https://gist.github.com/jedie/45ddf8ee7e24704c9485eb8cbcf9ba13 graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa
EDIT: I re-implement a "standalone" Benckmark test with Django REST-Framework that will so similar stuff... And yes, it's very, very faster: ~8ms
https://gist.github.com/jedie/1d658a184eb4435383820aa0c647d7e9
I was fixing a performance issue in graphene-mongo: https://github.com/graphql-python/graphene-mongo/issues/125
My pull request brought down the response time from 2s to 0.02s on a dataset of 12000 documents in MongoDB.
The solution was to provide the list_slice_length in default_resolver to prevent the default resolver from doing a len() on the collection.
It would appear that the default behavior for many orms when doing len on their collections is to load all objects in the collection.
Although I resolved this particular issue, there were plenty more like this one. I stopped trying because it would require some major changes to Graphene in order to fix it.
Will issues like this be fixed for v3?
@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.
Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287
There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b
Graphene 2: 12.702938017901033 Graphene 3: 12.651812066091225
There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b
Graphene 2: 12.702938017901033 Graphene 3: 12.651812066091225
That particular problem was fixed in https://github.com/graphql-python/graphql-core/issues/54, fix was released in graphql-core 3.1.0.
FWIW, I am using graphql-core==3.1.4
with Ariadne and am still seeing a fair bit of unexplained time spent when returning larger data sets. This is at least one place that I am seeing a lot of time being spent in various forms of completion (e.g. complete_value_catching_error
) vs resolving of values (e.g. resolve_field_value_or_error
):
Function: resolve_field at line 578
Line # Hits Time Per Hit % Time Line Contents
==============================================================
578 def resolve_field(
579 self,
580 parent_type: GraphQLObjectType,
581 source: Any,
582 field_nodes: List[FieldNode],
583 path: Path,
584 ) -> AwaitableOrValue[Any]:
585 """Resolve the field on the given source object.
586
587 In particular, this figures out the value that the field returns by calling its
588 resolve function, then calls complete_value to await coroutine objects,
589 serialize scalars, or execute the sub-selection-set for objects.
590 """
591 2189 1355.0 0.6 1.3 field_node = field_nodes[0]
592 2189 1523.0 0.7 1.5 field_name = field_node.name.value
593
594 2189 4577.0 2.1 4.4 field_def = get_field_def(self.schema, parent_type, field_name)
595 2189 1159.0 0.5 1.1 if not field_def:
596 return Undefined
597
598 2189 1265.0 0.6 1.2 resolve_fn = field_def.resolve or self.field_resolver
599
600 2189 1128.0 0.5 1.1 if self.middleware_manager:
601 2189 4540.0 2.1 4.4 resolve_fn = self.middleware_manager.get_field_resolver(resolve_fn)
602
603 2189 10609.0 4.8 10.3 info = self.build_resolve_info(field_def, field_nodes, parent_type, path)
604
605 # Get the resolve function, regardless of if its result is normal or abrupt
606 # (error).
607 4378 25759.0 5.9 25.0 result = self.resolve_field_value_or_error(
608 2189 1140.0 0.5 1.1 field_def, field_nodes, resolve_fn, source, info
609 )
610
611 4378 48599.0 11.1 47.2 return self.complete_value_catching_error(
612 2189 1314.0 0.6 1.3 field_def.type, field_nodes, info, path, result
613 )
Perhaps still this old issue: https://github.com/graphql-python/graphql-core/issues/54#issuecomment-600670661?
And this is from Sentry profiling (spans are created just for the top level instance of any recursive calls):
In case this helps anyone, I ran some benchmarks against graphene-django and DRF with django-silk and discovered the FieldTracker
from django_model_utils
was the cause of my performance issues. The profile showed a heinous amount of time spent on the deepcopy
function.
To paint a picture here:
from model_utils import FieldTracker
class Profile(models.Model):
bio = models.TextField(blank=True)
bio_hashtags = models.ManyToManyField(Hashtag, blank=True)
tracker = FieldTracker()
def save(self, *args, **kwargs):
if self.id and self.tracker.has_changed('bio'):
self.reconcile_bio_hashtags()
super.save(*args, **kwargs)
Fetching 50 profiles, the following timings were obtained:
0.7 seconds is still pretty bad for a query of 50 things and a single postgres query, but 90% of my problem was not graphene.
Great find @kevindice . Thank you for posting!
Just to offer my experience, it seems like performance is still an issue. Returning a set of 50 items using Graphene (I'd consider them medium sized maybe? nothing unusual), requests were taking over 2 seconds to return on a Heroku free dyno – nearly all of this time is spent in GraphQL code according to NewRelic, I had optimised the queries already. Switching to Strawberry made no difference, and switching DRF greatly improved performance, so it definitely seems like the GraphQL Core is the issue.
I would have liked to investigate more but I am on a deadline, so I ended up switching to Rails with graphql-ruby which, to my surprise, is faster than either Python solution – the same query on Heroku with Rails returns in 300-400ms, so it's several times faster and makes the difference between a good and bad user experience. Interestingly with Rails, it seems using GraphQL is actually a bit faster than a normal REST endpoint!
I prefer the developer ergonomics of Python and Django but to be honest Django and Rails are similar enough in many ways that it's not a big deal for me to switch. Obviously you can't do this if you're deep into a project, but for anyone considering Python for a GraphQL project I think it's worth being aware of these potential performance issues – and also being aware that Graphene seems to be stuck in a potentially-unmaintained limbo, if I'd realised this sooner I'd probably have started with Rails.
@tomduncalf were you using Graphene v2 or v3? Also I'm surprised that switching to Strawberry didn't help. Based on these benchmarks: https://twitter.com/jayden_windle/status/1235323199220592644 I would expect Strawberry or Graphene v3 to be significantly better especially for lists of objects.
IMO I would expect GraphQL to always have a bit of a performance overhead compared to a REST endpoint since it's doing quite a bit more. It would be good to get Python performance to a point where it's comparable to other similar languages though (like ruby). Can you share any of your code?
@jkimbo This was on Graphene v3. Unfortunately I can't share the code as it's not open source and I don't really have the time to dig into exactly why it was slower right now, but if I do find time to make a simple repro comparing Python vs Ruby I will post it here!
So @tomduncalf you properly nerd sniped me with this and I ended up building a more "realistic" benchmark. I implemented an API to fetch the top 250 rated IMDB movies in Graphene, Strawberry, DjangoRestFramework and a plain JSON api, all hosted on Django. All the data comes from a sqlite db. The code is here: https://github.com/jkimbo/django-graphql-benchmarks and you can try it out the Graphene API here.
Here are the P99 results against the Heroku instance:
So you can see that Graphene (v3) and Strawberry (v0.73.1 contains a fix for a performance regression btw) are pretty much neck and neck, which is what I would expect considering that they are just different ways to setup a graphql-core server. DRF is definitely faster (~25% faster) and the the plain json endpoint is faster still. I couldn't replicate the 2 second response times you were seeing with your API @tomduncalf so I'm not sure what is going on there.
Overall GraphQL in Python is definitely slower that using something like DjangoRestFramework but not horribly so in my opinion. There are definitely things that can be improved though and thanks to this exercise I have some ideas for improvements we can make to Strawberry.
Would be interested in how this all compares to graphql-ruby as well but unfortunately my experience there is lacking.
Hey @jkimbo, thanks for doing this and I hope my initial post didn't come across too negatively – I just wanted to share my experience for anyone else in my situation (i.e not familiar with either Django or Rails and looking to pick one for a GraphQL project), as I didn't really find much online comparing the two for GraphQL specifically and I didn't realise there is a performance overhead.
You prompted me to do a little bit more digging as I felt bad for just saying it was slow 🙃 as your demo seems to perform pretty well. One thing I didn't think to mention is that I am using Relay with my API – I did a little bit of testing with my API and it seems like using Relay adds a fairly significant overhead – almost doubling the response times on Heroku for the same query vs. a non-Relay version! I wonder if you could try using Relay with Graphene on yours and see if you see similar results? Or if you tell me how to run yours, I can try it (pretty new to Python so couldn't work out how to run yours from a git clone).
My API does still seem quite slow compared to yours and I'm not really sure why, as yours is returning a larger set of data. I'm new to Django so I could be doing something a bit stupid somewhere. To be honest, I am going to stick with Rails at this point as it's probably a slightly better fit for what I am trying to do (build an API with as little code as possible basically, haha – the ecosystem of gems seems a bit more developed for some of the things I want to do), but if you have any suggestions of good ways to profile my Python I could give it a go.
Anyway, you piqued my curiosity so I reproduced your demo in Rails! The code is at https://github.com/tomduncalf/rails-graphql-benchmark and I've deployed it to Heroku in the EU region. It seems like it returns a bit faster than yours, but not dramatically so – I'm not sure how you run your benchmark but I'd be happy to try it on mine if it's useful for comparison.
There are two queries you can run, one Relay and one non-Relay (doesn't seem the Ruby version of GraphiQL supports embedding them in the URL!):
{
movies {
edges {
node {
id
imdbId
title
year
imageUrl
imdbRating
imdbRatingCount
director {
id
name
}
}
}
}
}
{
moviesPlain {
id
imdbId
title
year
imageUrl
imdbRating
imdbRatingCount
director {
id
name
}
}
}
Cheers, Tom
Last year I spent a few weekends pulling off a very minimal PoC based on @syrusakbary idea of generating template code to improve GQL performance in python.
Here's the link to the same - https://github.com/Ashish-Bansal/graphql-jit
It's very vague, un-tested PoC implementation, and needs a lot of work.
I won't get time to work on it, so in case anyone is interested, they can work over that PoC.
This thread is interesting but kind of hard to get a grasp on. That would be great if someone competent, like a maintainer of graphql-core
or graphene
could make a summary of the different solutions that still exist today (potential gain depending on the use case, potential issues, required version of graphene
/graphql-core
, maturity of the solution).
I also have a few questions:
graphene-sqlalchemy
be an issue when it comes to performances?cProfile
because I had no idead it couldn't measure async code.In my case, all requests I try to make take an enormous time to resolve (300ms for < 10 fields queries, and up to 3s for larger queries). On the other hand I sometimes get quite lower response time (~ x3 improvement) for the exact same query on the exact same database (same data state), all of which use the exact same docker image, sometimes I wonder if it's not a lack of memory/CPU issue which is causing this.
Anyway, thanks for this discussion, I hope someone smarter than me would be able to summarize the good ideas that lies here and there in this thread.
@cglacet
Is there a significant performance benefit in moving from v2 to v3?
Currently asking myself the same question. I took the liberty to fork jkimbo's benchmark suite mentioned above, update the frameworks to their latest version and include graphene v2 as well: flbraun/django-graphql-benchmarks
Here's the result from a fresh bench I did this morning:
As you can see the difference between v2 and v3 is pretty much non-existent, however the benchmark suite currently only serializes a bunch of ObjectTypes. Since this is the most primitive building block of a GraphQL API and doesn't cover real-world setups utilizing pagination, connections, etc, your experience may vary.
Maybe this is of any interest for you.
@flbraun nice update! Probably makes sense to add some asyncio into the mix as this adds a lot of overhead as well.
I updated my forked bench suite to be tested against multiple Python versions, see results here.
tl;dw: Graphene (both v2 and v3) perform almost identical on the same Python version. However, Graphene seems to have heavily benefited from performance improvements in the Python interpreter. Jumping from 3.10 to 3.11 alone shaves off ~20% off of mean response times, which is kinda impressive.
Maybe somebody with more knowledge about the inner workings of Graphene (and graphql-core) can leverage this information.
If you're running this in prod, you should try pyston...we've seen perf that is close to pypy but without warmup/other issues
Hi.
I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries
Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL
Our Stack:
All optimized SQL, and we get numbers like:
Chrome Dev Tools
Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?
Hi.
I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries
Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL
Our Stack:
- Python 3.11
- Django 4.2.8
- DRF 3.14.0
- Django Graphene 3.1.5
All optimized SQL, and we get numbers like:
Chrome Dev Tools
- 6 SQL Querues 56.44 ms
- 44us to send request
- 937 ms waiting for server response
- 8.48 ms to download
- Total ~1 second for a 100ms DB query!
Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?
Did you ever figure this out? @vade
@pfcodes We've done a lot of optimization on our stack in the interim, so I don't know if I can speak specifically to any one single change, but things we've observed.
We use Relay, and pagination:
We've seen 10x improvement in response time with some of the above.
I know it's hand wavy, but a lot of it was just really paying attention to details and ensuring that more complicated queries do in fact get optimized.
We do some hand tuned query set tweaks in some cases for fields that require model / db lookups and annotate them, and we found some hot spots where we unintentionally did dumb shit like evaluate a query set in place rather than use annotated values so the DB could do the work.
cc @rsomani95 - any thing else to add from the work we did that maybe im missing?
Also, theres a link to an issue with observations about the flame graph in question with observations from the optimizers author @ MrThearMan (sorry for the tag) - they deserve a ton of credit, this optimizer is best in class right now for Django and no one knows it :)
https://github.com/MrThearMan/graphene-django-query-optimizer/issues/86
For our use case, we send a few thousand objects to the client. We're currently using a normal JSON API, but are considering using GraphQL instead. However, when returning a few thousand objects, the overhead of resolving values makes it impractical to use. For instance, the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.
Is there a recommended way to improve the performance? The approach I've used successfully so far is to use the existing parser to parse the query, and then generate the response by creating dictionaries directly, which avoids the overhead of resolving/completing on every single value.