jkimbo / graphql-sync-dataloaders

Use DataLoaders in your Python GraphQL servers that have to run in a sync context (i.e. Django).
MIT License
41 stars 21 forks source link

feat: use contextvars to separate batch callbacks execution between multiple threads/contexts #15

Open bellini666 opened 1 year ago

bellini666 commented 1 year ago

I was migrating one project that I cannot yet migrate to strawberry (T_T) to graphene v3 and stumbled upon this lib as an alternative to Promise dataloaders.

It worked correctly, but I started getting GraphQL deferred execution failed to complete. when running queries concurrently. This is because the global batch callbacks object was being shared with all running contexts.

Using contextvars makes each running context (be it asyncio, threading, gevent) resolve only its own callbacks, fixing the issue

MichaelKim0407 commented 1 year ago

Hi! Idk if the project owner is still maintaining, but can you please see if your patch would fix my problem that I reported here? If yes I'll just check out your code instead. Thanks!

MichaelKim0407 commented 1 year ago

I just tried and it doesn't :(

I think the problem really is that _cache is attached to the dataloader instance, which is global. Instead it should be attached to a request. I'll see what I can figure out...

bellini666 commented 1 year ago

Hey @MichaelKim0407 ,

Hrm, maybe there's something extra that we can do here?

contextvars will make sure that each context gets its own value for that, very similar to threading.local, but it also works for gevent, asyncio (although not relevant for sync dataloaders), etc

Not sure if @jkimbo is still maintaining this, but pinging here to check :)

MichaelKim0407 commented 1 year ago

@bellini666

I ended up doing a dirty hack

def resolve_xxx(self, info):
    if not hasattr(info.context, '_dataloaders'):  # info.context is the HttpRequest object
        info.context._dataloaders = {}
    if 'xxx' not in info.context._dataloaders:
        info.context._dataloaders['xxx'] = SyncDataLoader(batch_load_fn)
    return info.context._dataloaders['xxx'](self.xxx_id)

The problem for me really is that the way the official doc is written suggests that dataloaders should be global, whereas in reality they should be initialized per request in a webserver. But graphene has not provided a way to properly do that.

BTW the official async dataloader does not work with Django 4 which is why I ended up with this library in the first place.

Arno500 commented 1 year ago

I had a few issues with the same error that you mentionned you fixed. I did another fork including this PR here: https://github.com/criteo-forks/graphql-sync-dataloaders/tree/vanilla And this fixed the issues from both of you @bellini666 and @MichaelKim0407 :)

For each operation (so a root GraphQL field in Query, Mutation or Subscription), a new context is created, separating the dataloaders cache and results

tony commented 1 year ago

@bellini666 @Arno500 Hi!

The fork that combines both of your contributions got dataloaders working on my side! (Wow) Can I use them in a fork too?

Do you release them under the same license as graphql-sync-dataloaders? (That'd be MIT)

tony commented 1 year ago

I had a few issues with the same error that you mentionned you fixed. I did another fork including this PR here: https://github.com/criteo-forks/graphql-sync-dataloaders/tree/vanilla And this fixed the issues from both of you @bellini666 and @MichaelKim0407 :)

@Arno500 Can you make a PR of your follow up with your commits from https://github.com/criteo-forks/graphql-sync-dataloaders/commits/vanilla?

Arno500 commented 1 year ago

@bellini666 @Arno500 Hi!

The fork that combines both of your contributions got dataloaders working on my side! (Wow) Can I use them in a fork too?

Do you release them under the same license as graphql-sync-dataloaders? (That'd be MIT)

Of course!

I had a few issues with the same error that you mentionned you fixed. I did another fork including this PR here: https://github.com/criteo-forks/graphql-sync-dataloaders/tree/vanilla And this fixed the issues from both of you @bellini666 and @MichaelKim0407 :)

@Arno500 Can you make a PR of your follow up with your commits from https://github.com/criteo-forks/graphql-sync-dataloaders/commits/vanilla?

Unfortunately, since this fork is depending on 2 PRs, I'm afraid this would create a lot of confusion in the merge order. I prefer to wait for those to be merged when the author wakes up from the deads and then add mine. But you are of course welcome to use/copy/modify/sell/fork the fork in criteo-forks :)

superlevure commented 10 months ago

If you folks are interested, we maintain a fork of this library over at https://github.com/loft-orbital/graphql-sync-dataloaders/

We solved the multi-threads issue with this commit https://github.com/jkimbo/graphql-sync-dataloaders/commit/e2a1b43ac95bd94cbf581cc839892ebcb95af40a

We've been using this in production for >6 months without major issues so far

Arno500 commented 10 months ago

Hello @superlevure! I'm afraid that the threading adaptation you did would not work with the new async way of working. If supposedly we have multiple requests per thread, that would break. I know it is currently not the case, but there is one day where that could happen :)

superlevure commented 10 months ago

That's a good point! We're not using async views currently so we're not impacted by this, but your point is valid nevertheless