graphql-python / graphene-django

Build powerful, efficient, and flexible GraphQL APIs with seamless Django integration.
http://docs.graphene-python.org/projects/django/en/latest/
MIT License
4.31k stars 770 forks source link

Support for Async Django #1394

Open jaw9c opened 1 year ago

jaw9c commented 1 year ago

Feedback requested.

This PR adds support for Django async views. The general strategy follows the same principal as the base graphene library when supporting async. It implements the following things:

The current progress is as followed:

Other thoughts:

jaw9c commented 1 year ago

Progress update: All the fields resolvers are supporting both being executed in the async context and wrapping django sync code into a sync context. I've made the assumption that any resolving of DjangoXType/Field fields will need to be executed in a sync context which I think is pretty safe. I've started running through all the tests and adding an extra assertion that running in an async executor produces the same result as the sync one.

Currently people switching to using the async view will need to wrap all their top level resolvers (those that are directly on the Query class passed into the schema) in @sync_to_async if they run sync code. If someone has an idea of how to prevent this that would be great, but can't seem to wrap my head around that!

Next steps are to run through all the test files and adding the helper.

It would great if people could test this either on their production projects (speciically when running under ASGI), as 'm interested in how swapping in/out of the sync context affects the performance of the execution.

firaskafri commented 1 year ago

Progress update: All the fields resolvers are supporting both being executed in the async context and wrapping django sync code into a sync context. I've made the assumption that any resolving of DjangoXType/Field fields will need to be executed in a sync context which I think is pretty safe. I've started running through all the tests and adding an extra assertion that running in an async executor produces the same result as the sync one.

Currently people switching to using the async view will need to wrap all their top level resolvers (those that are directly on the Query class passed into the schema) in @sync_to_async if they run sync code. If someone has an idea of how to prevent this that would be great, but can't seem to wrap my head around that!

Next steps are to run through all the test files and adding the helper.

It would great if people could test this either on their production projects (speciically when running under ASGI), as 'm interested in how swapping in/out of the sync context affects the performance of the execution.

This is great news! Would discuss with my team to try this on production once its ready!

jaw9c commented 1 year ago

@firaskafri Ready to go for reviews on this one! Managed to solve the problem of wrapping everything in sync_to_async by utilising a middleware. If an implementing project would like to use async they only need to do the following:

Looking forward to feedback!

pcraciunoiu commented 1 year ago

GraphiQL wasn't working for me on this branch, so I pushed a small change here https://github.com/UpliftAgency/graphene-django/commits/support-async (specific commit https://github.com/UpliftAgency/graphene-django/commit/8368ae2d86604995ef6ccd051d8b724b69a5bc35)

So far tested it with data loaders for a sample app and everything seems to be working. Thanks for your work on this!

pcraciunoiu commented 1 year ago

I spoke too soon, there is one issue I need to narrow down - sometimes related fields try to use sync context and break, e.g. I have a blog that has articles, and each article has an author. Sometimes article.author seems to get resolved sync using attr_resolver and that doesn't work in async context.

I haven't tried to isolate the issue yet because it doesn't always happen, but it could be fine to worry about it in a follow-up and get this merged!

firaskafri commented 1 year ago

I think I will start testing next week on a large project, will get back with a feedback in it soon!

dima-kov commented 1 year ago

Hi team! Thank you all for your great input and initiative! Could I ask please what are the plans to have this feature released? Is it feasible to release it in one-two month?

dolgidmi commented 1 year ago

@firaskafri

I think I will start testing next week on a large project, will get back with a feedback in it soon!

Any news on testing?

jaw9c commented 1 year ago

Thanks for the input and testing guys! Super appreciated. Apologies for the lack of progress on this from me! Have been in firefighting mode recently and this slipped down the priorities.

The state of play from before is that I was testing the performance of async vs sync, both running under asgi and wsgi. During this I found a bug which made the app hang when running async on ASGI locally - I was struggling to debug this and is the first thing to focus on next.

If you'd like to help - the best way is to test on your own projects and get some relative performance stats under load that would be great. I'll likely be building a tool for my own projects that I can put in a gist but this will take time vs testing manually and I'd like to squash any complexity/bugs ASAP.

TL;DR; sorry for pausing work on this; if you want to help progress it, please test on your own projects and report any issues here.

jaw9c commented 1 year ago

Good news: Fixed the hang issue. Bad news: The amount of time I wasted figuring out it was only an async incompatible middleware on our work env. I've run some preliminary load testing on our staging env, and got 2x throughput on resolving a queryset. Next up is some analysis on run times, I'll post these in a digestible format - hopefully by next week. Keen to get this merged soon.

dima-kov commented 1 year ago

Good news: Fixed the hang issue. Bad news: The amount of time I wasted figuring out it was only an async incompatible middleware on our work env. I've run some preliminary load testing on our staging env, and got 2x throughput on resolving a queryset. Next up is some analysis on run times, I'll post these in a digestible format - hopefully by next week. Keen to get this merged soon.

Faced exactly the same issue, having only one sync middleware makes django hang on async handlers

cpd67 commented 1 year ago

@jaw9c Thanks a lot for opening this up! It would be really, really nice if this could get reviewed and merged soon, as this is blocking us from being able to upgrade to the latest version of graphene-django.

firaskafri commented 1 year ago

@jaw9c any updates?

pfcodes commented 11 months ago

Really need this!

dima-kov commented 10 months ago

Hey mates, ready to donate some bucks via buymeacoffee if you make this PR to live!

pfcodes commented 10 months ago

Hey mates, ready to donate some bucks via buymeacoffee if you make this PR to live!

I'm down to throw some $ too. Maintainers please consider enabling sponshorship for this project if it means that things can move more quickly. A lot of people are dependent on this.

alexandrubese commented 10 months ago

Migrating to v3 version, without making sure dataloaders and everything works, so that people need to implement new libraries (check @superlevure comments above)

dima-kov commented 10 months ago

@alexandrubese so what's the way?

alexandrubese commented 10 months ago

@dima-kov Use an older version of django-graphene where things work?

Which might not be ideal because you’re basically just adding “legacy code” that you will need to update once they will make v3 work (although it’s been almost a year this is discussed, I don’t know when this will work and they will fix dataloaders)

I don’t jnderstand the intricacies of the v3 update, maybe they had to do it, that’s why it was maybe forced ?

Or use a different library

PS: I don’t know why the Django team didn’t work to add proper GraphQL support.

Spring, .Net MVC and many other “web frameworks” did it.

dima-kov commented 10 months ago

Regarding the comment from @superlevure:

I agree with releasing AsyncGraphQLView.

We are almost there; the only remaining tasks are performance benchmarks and documentation updates. This implies that most of the work is already done, and only a few percentages are left.

@jaw9c, it seems you might be short on time for this. I'm willing to contribute efforts at this point.

@superlevure, to be transparent, I haven't conducted performance testing before, but I'm attempting it now. Any suggestions on how to approach this appropriately would be highly appreciated.

dima-kov commented 10 months ago

Benchmarks: 16.547 seconds vs 83.194 seconds for i/o bound queries. 5x faster. async wins

Details are here: https://github.com/dima-kov/django-graphene-benchmarks?tab=readme-ov-file#tldr

Async:

Concurrency Level:      10
Time taken for tests:   16.547 seconds
Complete requests:      1000

vs

Sync

Concurrency Level:      10
Time taken for tests:   83.194 seconds
Complete requests:      1000

@superlevure are we fine now? What should be next steps to make this public?

kamilglod commented 10 months ago

@dima-kov shouldn't you run sync benchmark with 10 workers instead of 4? I know that measuring sync vs async is hard but with 10 workers we would have consistent number of requests that are handled in the same time by the server so we should get actual comparison. I guess we should get similar results which is fine as the biggest benefit of using async is less resources used.

dima-kov commented 10 months ago

hmm, that would eats lots of memory. from docs:

Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with.

I'm running on 4 cores machine, so trying gunicorn with 9 workers I've got only slightly better result:

Concurrency Level:      10
Time taken for tests:   80.785 seconds
Complete requests:      1000
Failed requests:        0

and memory usage was 9 processes ~80mb each = 720. The bottleneck here might be sqlite3 database.


vs async:

Concurrency Level:      10
Time taken for tests:   17.586 seconds
Complete requests:      1000
Failed requests:        0

and mem usage was 1 process 60mb.

kamilglod commented 9 months ago

Sqlite docs says that concurrent writes are locked, but reads should be fine so I think it's not a problem with sqlite itself.

Maybe it's because of the differences in sync and async resolvers? In async you're fetching related octopus_type, in sync not. https://github.com/dima-kov/django-graphene-benchmarks/blob/main/project/api/schema/queries.py#L43-L51 https://github.com/dima-kov/django-graphene-benchmarks/blob/main/project/api/schema_async/queries.py#L48-L64

dima-kov commented 9 months ago

oh, shame on me. so here is fixed version comparison:

Comparison

Sync Async Sync Async
Requests 1000 1000 1000 1000
Concurrency 100 100 100 100
Processes 9 1 4 1
Threads per proc 1 100 1 100
Mem ~720mb ~80mb ~320mb ~80mb
Time 23.384s 13.411s 24.465s 13.670s
dima-kov commented 9 months ago

Now, it can be asserted that the releasing of the Async version will result in a twofold acceleration of I/O endpoints.

Moreover, take a look on this much fair (same resources) comparison:

Sync Async
Requests 1000 1000
Concurrency 100 100
Processes 9 9
Threads per process 1 1-30
Mem ~720mb ~720mb
Time 23.384s 4.719s

We encounter a tradeoff of x5 with identical resource utilization, excluding threads!

dima-kov commented 9 months ago

Guys, we really need this (having async resolvers). Please tell us how can we help to make it public., I do not want to start using it unsure this is merged in main.

Or at least let us know your are going to release this, but we are missing: 1,2,3...

superlevure commented 9 months ago

Thanks for the benchmarks, I'll review the PR again tomorrow. Note that I don't have merge rights on this repo, we'll need a review from one of @firaskafri, @sjdemartini, @kiendang (or others)

firaskafri commented 9 months ago

@superlevure i think it is good to go as soon as we clarify the docs What do you think @kiendang @jaw9c @sjdemartini

superlevure commented 9 months ago

@dima-kov I had a look at your benchmarks and I have few remarks.

First, it looks like you're comparing the sync and async resolvers on the same branch of graphene-django (this PR's branch). I believe it would actually be more fair to compare the async / sync versions of this branch and the sync version of graphene-django's main branch since this PR also affects sync resolvers and the point of the benchmarks is also to make sure no performance penalty is introduced to already existing code.

Second, I noticed the sync resolver is returning 500 objects while the async version is only returning 10 objects.

I took the liberty of pushing a PR to your repo that covers those point, as well as setting up a docker based environment and postgres as a DB to be closer to a real life use case.

I obtain the following results:

# Sync version [main]
Concurrency Level:      100
Time taken for tests:   45.517 seconds
Complete requests:      1000
Failed requests:        0
Non-2xx responses:      1000
Total transferred:      60232000 bytes
Total body sent:        238000
HTML transferred:       59963000 bytes
Requests per second:    21.97 [#/sec] (mean)
Time per request:       4551.739 [ms] (mean)
Time per request:       45.517 [ms] (mean, across all concurrent requests)

# Sync version [this branch]
Concurrency Level:      100
Time taken for tests:   203.300 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      45805000 bytes
Total body sent:        244000
HTML transferred:       45382000 bytes
Requests per second:    4.92 [#/sec] (mean)
Time per request:       20330.019 [ms] (mean)
Time per request:       203.300 [ms] (mean, across all concurrent requests)

# Async version [this branch]
# didn't run till the end, see below

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

I'll continue to play with the PR a bit tonight to make sure there's nothing wrong with my setup, but I'm curious if others can reproduce similar results.

dima-kov commented 9 months ago

@superlevure thank you for looking into it! I missed that part

dima-kov commented 9 months ago

regarding the results you've got: hmm, thats really strange. Both sync and async (100 requests concurrently and 1k requests in total) showed me +- 20s. 100+s is smth unbelievable for me. Maybe the thing is with dockerization.

I'm playing with main version now. Main:

Concurrency Level:      100
Time taken for tests:   23.706 seconds
Complete requests:      1000
Failed requests:        0
dolgidmi commented 9 months ago

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

Django currently doesn't support persistent connections for ASGI https://code.djangoproject.com/ticket/33497

henadzit commented 4 months ago

Thank you for all your work! How can we help to get it merged?

dima-kov commented 4 months ago

Data tends to show a huge perf hit for the sync version of this PR. Concerning the async version, I run into the following DB error which which makes all requests fail until I restart the DB:

2024-02-11 16:31:39.310 UTC [4579] FATAL:  sorry, too many clients already

I suspect the async version is leaving DB connections opened somewhere (Postgres max_connections settings is set at 100 which is the concurrency level used in the benchmark)

Django currently doesn't support persistent connections for ASGI https://code.djangoproject.com/ticket/33497

For those who use any external connection pooler like pgbouncer or AWS RDS Connection Pooling, this will not be a problem.

https://docs.djangoproject.com/en/5.1/releases/5.1/#postgresql-connection-pools

django 5.1 will support connection pool and this might be an opportunity to merge the PR

pcraciunoiu commented 4 months ago

I was getting an error that the view is not async on the latest Django release

Based on this, there must be an async method or the property returns false. As a quick fix, I pushed this classproperty to be true:

https://github.com/UpliftAgency/graphene-django/commit/cc665dfe2b45d7006971c66ddaebc5f7ae0c77fc

Feel free to add that in.

jaw9c commented 4 months ago

Apologies for the radio silence on this thread. Appreciate the feedback and efforts from everyone on this PR - in particular the benchmarking.

A fair few months ago, we rolled this into production at my work, utilising the new AsyncGraphQLView but found that we very quickly maxed out the DB connection pool. Since then we had to put the efforts on ice in lieu of making sure the startup was default alive... We've been using this branch in production since my last post, server under ASGI, but not using the AsyncGraphQLView.

I think this is safe to merge, but with an experimental warning for those willing to test and use the AsyncGraphQLView. (Perhaps we rename it to experimental_AsyncGraphQLView` or log a warning? Just to make sure people understand the DB connections point!

I've been carefully watching the movements by the Django team on making the ORM fully async, this PR is the latest, which is really encouraging!

henadzit commented 3 months ago

django 5.1 will support connection pool and this might be an opportunity to merge the PR

Looks like Django 5.1 got released!

chris-stetter commented 3 months ago

Are there plans to publish a tagged release? Happy to support if anything is missing.

superlevure commented 3 months ago

[..] I think this is safe to merge, but with an experimental warning for those willing to test and use the AsyncGraphQLView. (Perhaps we rename it to experimental_AsyncGraphQLView` or log a warning? Just to make sure people understand the DB connections point!

[..]

@jaw9c, do you think it's safe to merge even though my benchmarks showed this PR degrades performance for sync resolvers?

vt-rc commented 3 weeks ago

any luck with this?