Performance issues with large data sets

mwilliamson-healx commented 8 years ago

For our use case, we send a few thousand objects to the client. We're currently using a normal JSON API, but are considering using GraphQL instead. However, when returning a few thousand objects, the overhead of resolving values makes it impractical to use. For instance, the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.

Is there a recommended way to improve the performance? The approach I've used successfully so far is to use the existing parser to parse the query, and then generate the response by creating dictionaries directly, which avoids the overhead of resolving/completing on every single value.

import graphene

class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class Query(graphene.ObjectType):
    users = graphene.Field(UserQuery.List())

    def resolve_users(self, args, info):
        return users

class User(object):
    def __init__(self, id):
        self.id = id

users = [User(index) for index in range(0, 10000)]

schema = graphene.Schema(query=Query)

print(schema.execute('{ users { id } }').data)

qeternity commented 5 years ago

For those of you using Django, this is a must-use: https://github.com/tfoxy/graphene-django-optimizer

Performs pretty amazing automatic select_related and prefetch_related to avoid lazy loading resolved fields

ktosiek commented 5 years ago

@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.

Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287

Miloas commented 5 years ago

I have a prototype that went from 0.8s to 0.3s by moving from graphene ObjectType to a dict and a simple fields filtering/renaming function.

@ktosiek hi, thx for your awesome idea to skip expensive complete_value call.

But I cant understand moving from graphene ObjectType to a dict and a simple fields filtering/renaming function.

What's the meaning of moving graphene.ObjectType to a dict ?

It's very helpful if you give a simple example code.

ktosiek commented 5 years ago

@Miloas It's about using a custom function instead of Graphene resolvers. Something like:

def extract_fields(info, data):
    """Extract fields requested by the client from `data`"""
    return OrderedDict(
        (field.alias.value if field.alias else field.name.value,
         data[field.name.value])
        for field in info.field_asts[0].selection_set.selections)

This code is just an illustration, not a tested example. The general idea is that the client might request just a selection of fields, and might even request a field under a different name (in GraphQL: { nameTheClientLikes: actualFieldName }) - we don't want to lose that, even for the "raw results" path.

You'd use this with the monkeypatch I've mentioned earlier like this:

def resolve_goodbye(root, info):
    return RawGraphQLResult([extract_fields(info, elem) for elem in root.some_huge_list])

qeternity commented 5 years ago

@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.

Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287

@ktosiek Just tested locally and found the same although not nearly to the degree that you did (v3 is ~25-50% slower).

However profiling this seems to indicate an unreal amount of recursive functions and type checking. To serialize 100k integers we're making 36 million isinstance calls, 2 million complete_value calls and 7 million isawaitable calls.

All in all, we're averaging nearly 1,000 function invocations per object. I'm admittedly not extremely familiar with the graphql architecture and design decisions, but this seems excessive.

(env3) ~/Code/graphene head -n 50 cprof.txt 53.549506894000004 99251786 function calls (93244536 primitive calls) in 52.909 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 36016108/36016098 8.114 0.000 12.218 0.000 {built-in method builtins.isinstance} 2000010/10 6.241 0.000 53.415 5.342 execute.py:701(complete_value) 7000070 6.081 0.000 15.722 0.000 inspect.py:221(isawaitable) 1000010/10 2.889 0.000 53.416 5.342 execute.py:570(resolve_field) 1000010/10 2.537 0.000 53.416 5.342 execute.py:416(execute_fields) 2000010/10 2.340 0.000 53.415 5.342 execute.py:640(complete_value_catching_error) 7000072 2.291 0.000 2.291 0.000 {built-in method _abc._abc_instancecheck} 10 1.915 0.192 53.415 5.342 execute.py:785(complete_list_value) 7000072 1.813 0.000 4.104 0.000 abc.py:137(instancecheck) 1000010 1.684 0.000 7.965 0.000 execute.py:607(resolve_field_value_or_error)

wojcikstefan commented 4 years ago

The example given in this issue's description is outdated (unless I'm missing something). @syrusakbary does it make sense to close this issue since it's misleading to newcomers (like myself)? A separate issue could be open about further performance improvements.

Context: The original issue says:

the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.

This is no longer true. Here's the setup code that, AFAICT, is the modern equivalent of the code from the issue's description:

import graphene

class User(object):
    def __init__(self, id):
        self.id = id

users = [User(index) for index in range(0, 10000)]

class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class Query(graphene.ObjectType):
    users = graphene.List(UserQuery)

    def resolve_users(root, info):
        return users

schema = graphene.Schema(query=Query)

Now, this consistently gives ~300-400ms on my 2018 MBP, not 10s as the issue description suggests. Example:

In [28]: %timeit rv = schema.execute('{ users { id } }')
1 loop, best of 3: 336 ms per loop

mwilliamson commented 4 years ago

Although the example is faster than when I originally opened the issue, it's still too slow for my use-case, and is around 10-15 times slower than the library we're currently using.

wojcikstefan commented 4 years ago

Yep, there's definitely plenty of room for improvement still. I just argue that this issue has become outdated, misleading, and quite messy overall :) Would be good to start afresh with a new issue mentioning the accurate status quo and what next steps can be taken.

oldnote commented 4 years ago

I made following plug and play gist based on other gist above - https://gist.github.com/Anon731/0f58012ffb5be5e229e70dd44baa4258

Comparison 1.2ms vs 3.7ms.

jedie commented 4 years ago

I also make some tests and compare graphene-django with django-rest-framework.

In both cases i used a view with pagination (DjangoConnectionField or PageNumberPagination). The tests requests 100 items from existing 1000 items.

graphene:

django v1.11.26
graphene v2.1.8
graphene-django v2.7.1

timeit... use 5 * 75 loop...
max...: 10.84 ms
median: 10.66 ms
min...: 10.61 ms
cProfile stats for one request: 30148 function calls (28688 primitive calls) in 0.019 seconds

Rest-API:

django v1.11.26
Rest-Framework v3.9.4

timeit... use 5 * 213 loop...
max...: 3.81 ms
median: 3.63 ms
min...: 3.58 ms
cProfile stats for one request: 14171 function calls (13142 primitive calls) in 0.007 seconds

What immediately stands out is the much higher number of function calls.

I also run a test with graphene v3 with django 2.2 with the graphene-django sources from: https://github.com/graphql-python/graphene-django/issues/812 It's ~30% slower than graphene v2 I hope this is slower because it's not the final code yet.

jkimbo commented 4 years ago

@jedie could you share the code you used to benchmark Graphene vs Rest Framework?

jedie commented 4 years ago

@jedie could you share the code you used to benchmark Graphene vs Rest Framework?

Sorry, can't share the code. But it's really a minimal example code.

But i made another tests and benchmark only with graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa

It fetches only a list of 1000 dummy items and takes ~20ms

jedie commented 4 years ago

Now, i also made a similar test with tartiflette.

To my surprise: tartiflette (~57ms) is significantly slower than graphql-core (~20ms).

My benchmark code:

tartiflette: https://gist.github.com/jedie/45ddf8ee7e24704c9485eb8cbcf9ba13 graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa

EDIT: I re-implement a "standalone" Benckmark test with Django REST-Framework that will so similar stuff... And yes, it's very, very faster: ~8ms

https://gist.github.com/jedie/1d658a184eb4435383820aa0c647d7e9

sostholm commented 4 years ago

I was fixing a performance issue in graphene-mongo: https://github.com/graphql-python/graphene-mongo/issues/125

My pull request brought down the response time from 2s to 0.02s on a dataset of 12000 documents in MongoDB.

The solution was to provide the list_slice_length in default_resolver to prevent the default resolver from doing a len() on the collection.

It would appear that the default behavior for many orms when doing len on their collections is to load all objects in the collection.

Although I resolved this particular issue, there were plenty more like this one. I stopped trying because it would require some major changes to Graphene in order to fix it.

Will issues like this be fixed for v3?

doda commented 3 years ago

@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.

Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287

There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b

Graphene 2: 12.702938017901033 Graphene 3: 12.651812066091225

ktosiek commented 3 years ago

There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b

Graphene 2: 12.702938017901033 Graphene 3: 12.651812066091225

That particular problem was fixed in https://github.com/graphql-python/graphql-core/issues/54, fix was released in graphql-core 3.1.0.

cancan101 commented 3 years ago

FWIW, I am using graphql-core==3.1.4 with Ariadne and am still seeing a fair bit of unexplained time spent when returning larger data sets. This is at least one place that I am seeing a lot of time being spent in various forms of completion (e.g. complete_value_catching_error) vs resolving of values (e.g. resolve_field_value_or_error):

Function: resolve_field at line 578

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   578                                               def resolve_field(
   579                                                   self,
   580                                                   parent_type: GraphQLObjectType,
   581                                                   source: Any,
   582                                                   field_nodes: List[FieldNode],
   583                                                   path: Path,
   584                                               ) -> AwaitableOrValue[Any]:
   585                                                   """Resolve the field on the given source object.
   586                                           
   587                                                   In particular, this figures out the value that the field returns by calling its
   588                                                   resolve function, then calls complete_value to await coroutine objects,
   589                                                   serialize scalars, or execute the sub-selection-set for objects.
   590                                                   """
   591      2189       1355.0      0.6      1.3          field_node = field_nodes[0]
   592      2189       1523.0      0.7      1.5          field_name = field_node.name.value
   593                                           
   594      2189       4577.0      2.1      4.4          field_def = get_field_def(self.schema, parent_type, field_name)
   595      2189       1159.0      0.5      1.1          if not field_def:
   596                                                       return Undefined
   597                                           
   598      2189       1265.0      0.6      1.2          resolve_fn = field_def.resolve or self.field_resolver
   599                                           
   600      2189       1128.0      0.5      1.1          if self.middleware_manager:
   601      2189       4540.0      2.1      4.4              resolve_fn = self.middleware_manager.get_field_resolver(resolve_fn)
   602                                           
   603      2189      10609.0      4.8     10.3          info = self.build_resolve_info(field_def, field_nodes, parent_type, path)
   604                                           
   605                                                   # Get the resolve function, regardless of if its result is normal or abrupt
   606                                                   # (error).
   607      4378      25759.0      5.9     25.0          result = self.resolve_field_value_or_error(
   608      2189       1140.0      0.5      1.1              field_def, field_nodes, resolve_fn, source, info
   609                                                   )
   610                                           
   611      4378      48599.0     11.1     47.2          return self.complete_value_catching_error(
   612      2189       1314.0      0.6      1.3              field_def.type, field_nodes, info, path, result
   613                                                   )

Perhaps still this old issue: https://github.com/graphql-python/graphql-core/issues/54#issuecomment-600670661?

And this is from Sentry profiling (spans are created just for the top level instance of any recursive calls):

kevindice commented 3 years ago

In case this helps anyone, I ran some benchmarks against graphene-django and DRF with django-silk and discovered the FieldTracker from django_model_utils was the cause of my performance issues. The profile showed a heinous amount of time spent on the deepcopy function.

To paint a picture here:

from model_utils import FieldTracker

class Profile(models.Model):
    bio = models.TextField(blank=True)
    bio_hashtags = models.ManyToManyField(Hashtag, blank=True)

    tracker = FieldTracker()

    def save(self, *args, **kwargs):
        if self.id and self.tracker.has_changed('bio'):
            self.reconcile_bio_hashtags()
        super.save(*args, **kwargs)

Fetching 50 profiles, the following timings were obtained:

DRF w/ field-tracker enabled: 10.2 seconds
graphene-django (using relay fwiw): 9.7 seconds
DRF w/ field-tracker disabled: 0.159 seconds
graphene-django w/ field-tracker disabled: 0.7 seconds

0.7 seconds is still pretty bad for a query of 50 things and a single postgres query, but 90% of my problem was not graphene.

Kobold commented 3 years ago

Great find @kevindice . Thank you for posting!

tomduncalf commented 3 years ago

Just to offer my experience, it seems like performance is still an issue. Returning a set of 50 items using Graphene (I'd consider them medium sized maybe? nothing unusual), requests were taking over 2 seconds to return on a Heroku free dyno – nearly all of this time is spent in GraphQL code according to NewRelic, I had optimised the queries already. Switching to Strawberry made no difference, and switching DRF greatly improved performance, so it definitely seems like the GraphQL Core is the issue.

I would have liked to investigate more but I am on a deadline, so I ended up switching to Rails with graphql-ruby which, to my surprise, is faster than either Python solution – the same query on Heroku with Rails returns in 300-400ms, so it's several times faster and makes the difference between a good and bad user experience. Interestingly with Rails, it seems using GraphQL is actually a bit faster than a normal REST endpoint!

I prefer the developer ergonomics of Python and Django but to be honest Django and Rails are similar enough in many ways that it's not a big deal for me to switch. Obviously you can't do this if you're deep into a project, but for anyone considering Python for a GraphQL project I think it's worth being aware of these potential performance issues – and also being aware that Graphene seems to be stuck in a potentially-unmaintained limbo, if I'd realised this sooner I'd probably have started with Rails.

jkimbo commented 3 years ago

@tomduncalf were you using Graphene v2 or v3? Also I'm surprised that switching to Strawberry didn't help. Based on these benchmarks: https://twitter.com/jayden_windle/status/1235323199220592644 I would expect Strawberry or Graphene v3 to be significantly better especially for lists of objects.

IMO I would expect GraphQL to always have a bit of a performance overhead compared to a REST endpoint since it's doing quite a bit more. It would be good to get Python performance to a point where it's comparable to other similar languages though (like ruby). Can you share any of your code?

tomduncalf commented 3 years ago

@jkimbo This was on Graphene v3. Unfortunately I can't share the code as it's not open source and I don't really have the time to dig into exactly why it was slower right now, but if I do find time to make a simple repro comparing Python vs Ruby I will post it here!

jkimbo commented 3 years ago

So @tomduncalf you properly nerd sniped me with this and I ended up building a more "realistic" benchmark. I implemented an API to fetch the top 250 rated IMDB movies in Graphene, Strawberry, DjangoRestFramework and a plain JSON api, all hosted on Django. All the data comes from a sqlite db. The code is here: https://github.com/jkimbo/django-graphql-benchmarks and you can try it out the Graphene API here.

Here are the P99 results against the Heroku instance:

requests-time requests-pre-second

So you can see that Graphene (v3) and Strawberry (v0.73.1 contains a fix for a performance regression btw) are pretty much neck and neck, which is what I would expect considering that they are just different ways to setup a graphql-core server. DRF is definitely faster (~25% faster) and the the plain json endpoint is faster still. I couldn't replicate the 2 second response times you were seeing with your API @tomduncalf so I'm not sure what is going on there.

Overall GraphQL in Python is definitely slower that using something like DjangoRestFramework but not horribly so in my opinion. There are definitely things that can be improved though and thanks to this exercise I have some ideas for improvements we can make to Strawberry.

Would be interested in how this all compares to graphql-ruby as well but unfortunately my experience there is lacking.

tomduncalf commented 3 years ago

Hey @jkimbo, thanks for doing this and I hope my initial post didn't come across too negatively – I just wanted to share my experience for anyone else in my situation (i.e not familiar with either Django or Rails and looking to pick one for a GraphQL project), as I didn't really find much online comparing the two for GraphQL specifically and I didn't realise there is a performance overhead.

You prompted me to do a little bit more digging as I felt bad for just saying it was slow 🙃 as your demo seems to perform pretty well. One thing I didn't think to mention is that I am using Relay with my API – I did a little bit of testing with my API and it seems like using Relay adds a fairly significant overhead – almost doubling the response times on Heroku for the same query vs. a non-Relay version! I wonder if you could try using Relay with Graphene on yours and see if you see similar results? Or if you tell me how to run yours, I can try it (pretty new to Python so couldn't work out how to run yours from a git clone).

My API does still seem quite slow compared to yours and I'm not really sure why, as yours is returning a larger set of data. I'm new to Django so I could be doing something a bit stupid somewhere. To be honest, I am going to stick with Rails at this point as it's probably a slightly better fit for what I am trying to do (build an API with as little code as possible basically, haha – the ecosystem of gems seems a bit more developed for some of the things I want to do), but if you have any suggestions of good ways to profile my Python I could give it a go.

Anyway, you piqued my curiosity so I reproduced your demo in Rails! The code is at https://github.com/tomduncalf/rails-graphql-benchmark and I've deployed it to Heroku in the EU region. It seems like it returns a bit faster than yours, but not dramatically so – I'm not sure how you run your benchmark but I'd be happy to try it on mine if it's useful for comparison.

There are two queries you can run, one Relay and one non-Relay (doesn't seem the Ruby version of GraphiQL supports embedding them in the URL!):

{
  movies {
    edges {
      node {
        id
        imdbId
        title
        year
        imageUrl
        imdbRating
        imdbRatingCount
        director {
          id
          name
        }
      }
    }
  }
}

{
  moviesPlain {
    id
    imdbId
    title
    year
    imageUrl
    imdbRating
    imdbRatingCount
    director {
      id
      name
    }
  }
}

Cheers, Tom

Ashish-Bansal commented 2 years ago

Last year I spent a few weekends pulling off a very minimal PoC based on @syrusakbary idea of generating template code to improve GQL performance in python.

Here's the link to the same - https://github.com/Ashish-Bansal/graphql-jit

It's very vague, un-tested PoC implementation, and needs a lot of work.

I won't get time to work on it, so in case anyone is interested, they can work over that PoC.

cglacet commented 1 year ago

This thread is interesting but kind of hard to get a grasp on. That would be great if someone competent, like a maintainer of graphql-core or graphene could make a summary of the different solutions that still exist today (potential gain depending on the use case, potential issues, required version of graphene/graphql-core, maturity of the solution).

I also have a few questions:

Is there a significant performance benefit in moving from v2 to v3?
Are there significant performance differences amongst python versions?
Can working with graphene-sqlalchemy be an issue when it comes to performances?
What tool(s) are you using to measure asynchronous code performances with decent precision and without interferences? (I tried yappi and it seems decently close to clock time but the output is not very readable). I first started to use cProfile because I had no idead it couldn't measure async code.
What is considered as a "large" dataset and how does size impact performances? I feel jus like @jkimbo and his benchmark look quite "realistic" to me (strangely when I experiment with the Graphene API I get response time under 200ms).

In my case, all requests I try to make take an enormous time to resolve (300ms for < 10 fields queries, and up to 3s for larger queries). On the other hand I sometimes get quite lower response time (~ x3 improvement) for the exact same query on the exact same database (same data state), all of which use the exact same docker image, sometimes I wonder if it's not a lack of memory/CPU issue which is causing this.

Anyway, thanks for this discussion, I hope someone smarter than me would be able to summarize the good ideas that lies here and there in this thread.

flbraun commented 1 year ago

@cglacet

Is there a significant performance benefit in moving from v2 to v3?

Currently asking myself the same question. I took the liberty to fork jkimbo's benchmark suite mentioned above, update the frameworks to their latest version and include graphene v2 as well: flbraun/django-graphql-benchmarks

Here's the result from a fresh bench I did this morning: results

As you can see the difference between v2 and v3 is pretty much non-existent, however the benchmark suite currently only serializes a bunch of ObjectTypes. Since this is the most primitive building block of a GraphQL API and doesn't cover real-world setups utilizing pagination, connections, etc, your experience may vary.

Maybe this is of any interest for you.

erikwrede commented 1 year ago

@flbraun nice update! Probably makes sense to add some asyncio into the mix as this adds a lot of overhead as well.

flbraun commented 1 year ago

I updated my forked bench suite to be tested against multiple Python versions, see results here.

tl;dw: Graphene (both v2 and v3) perform almost identical on the same Python version. However, Graphene seems to have heavily benefited from performance improvements in the Python interpreter. Jumping from 3.10 to 3.11 alone shaves off ~20% off of mean response times, which is kinda impressive.
Maybe somebody with more knowledge about the inner workings of Graphene (and graphql-core) can leverage this information.

qeternity commented 1 year ago

If you're running this in prod, you should try pyston...we've seen perf that is close to pypy but without warmup/other issues

vade commented 9 months ago

Hi.

I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries

Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL

Our Stack:

Python 3.11
Django 4.2.8
DRF 3.14.0
Django Graphene 3.1.5

All optimized SQL, and we get numbers like:

Chrome Dev Tools

6 SQL Querues 56.44 ms
44us to send request
937 ms waiting for server response
8.48 ms to download
Total ~1 second for a 100ms DB query!

Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?

profile-gunicorn-with-signed-url

pfcodes commented 2 months ago

Hi.

I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries

Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL

Our Stack:

Python 3.11

Django 4.2.8

DRF 3.14.0

Django Graphene 3.1.5

All optimized SQL, and we get numbers like:

Chrome Dev Tools

6 SQL Querues 56.44 ms

44us to send request

937 ms waiting for server response

8.48 ms to download

Total ~1 second for a 100ms DB query!

Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?

Did you ever figure this out? @vade

vade commented 2 months ago

@pfcodes We've done a lot of optimization on our stack in the interim, so I don't know if I can speak specifically to any one single change, but things we've observed.

We use Relay, and pagination:

Using an ORM SQL optimizer makes a world of difference, especially ones that pays close attention to details for more real world complicated queries. We've been helping debug graphene-django-query-optimizer which supports not only optimization, fetch related and selected related, but most importantly it optimizes nested query sets and field objects and supports filtering and relay pagination, and also optimizes synthetic fields (fields that are actually functions that may use other model fields).
We optimized the shit out of our GQL queries and redesigned our schema so things can be a bit more light weight in terms of fields we fetch
For some filtered query sets, resolving total count (for relay pagination) is highly suboptimial and total count takes a majority of the SQL time and for some reason the total count breaks some optimization paths. This one of our last large hurdles.

We've seen 10x improvement in response time with some of the above.

I know it's hand wavy, but a lot of it was just really paying attention to details and ensuring that more complicated queries do in fact get optimized.

We do some hand tuned query set tweaks in some cases for fields that require model / db lookups and annotate them, and we found some hot spots where we unintentionally did dumb shit like evaluate a query set in place rather than use annotated values so the DB could do the work.

cc @rsomani95 - any thing else to add from the work we did that maybe im missing?

vade commented 2 months ago

Also, theres a link to an issue with observations about the flame graph in question with observations from the optimizers author @ MrThearMan (sorry for the tag) - they deserve a ton of credit, this optimizer is best in class right now for Django and no one knows it :)

https://github.com/MrThearMan/graphene-django-query-optimizer/issues/86

graphql-python / graphene

Performance issues with large data sets #268