zoellner / gmail-batch-stream

Streaming interface to Gmail API using batch requests
MIT License
8 stars 2 forks source link

429 Too many concurrent requests for user #1

Closed eyalhakim closed 7 years ago

eyalhakim commented 7 years ago

Hi @zoellner ,

I'm using the library and find it exactly suitable to my need. However, i'm getting the above error despite using it.

Can you help?

zoellner commented 7 years ago

Hi @eyalhakim, happy to help. Can you provide some more details: Are you running any streams (for the same user) in parallel? Which commands are you running (the example in the readme uses messages.get, required quota units are different for other commands)? What is the quota (Queries per 100 seconds per user) on https://console.developers.google.com/apis/api/gmail.googleapis.com/quotas?project=YOUR_PROJECTNAME for your project?

When you change the line .pipe(GBS.pipeline(100, 1)) to something like .pipe(GBS.pipeline(100, 5)), do you still get the 429 response?

Google has changed the quota from units/second to units/100 seconds since I wrote this module. That might cause the issues. If we can confirm that that is the problem, I have some code in a different project that uses a rate limiter to respect the requests per 100s. I might be able to merge that into this module.

eyalhakim commented 7 years ago

Hi!

So, i'm running the exact same code as in your example, using users.messages.get. My messageIdStream has around 1700 ids, but all in one stream. My quota for queries per 100 seconds per user is 25,000 (i'm on the free plan) When changing to .pipe(GBS.pipeline(100, 5)) i get an exception 'Invalid number of operations per ms: 0.5' from Highland.

What's next? @zoellner Thank you very much for you help!

eyalhakim commented 7 years ago

hi @zoellner, any ideas?

zoellner commented 7 years ago

you will have to slow down the stream a bit for now until I get to implement a somewhat better rate limiter.

eyalhakim commented 7 years ago

@zoellner do you have a recommendation on how to slow it down?

Thank you

eyalhakim commented 7 years ago

@zoellner any ideas buddy? It would help me a great deal. Thanks!

zoellner commented 7 years ago

@eyalhakim I've just published a new version that includes a rate limiter that takes the newer quota definition into account. Note that I've also made some general modernization updates. Requires ES6 features now to run.

I've also updated the example in the readme file.

In the new version you can easily adjust the quota if needed. In the example instead of const GBS = new GmailBatchStream(process.env.ACCESS_TOKEN); you would use const GBS = new GmailBatchStream(process.env.ACCESS_TOKEN, options); the default options are set for 25000 requests per 100 sec:

{
  userQuota: 25000,
  userQuotaTime: 100000,
  parallelRequests: 10
}
eyalhakim commented 7 years ago

Thank you very much @zoellner! I have updated to the latest version. However, i am still getting the same error. I didn't make any changes to my code since i understand that the default settings are suitable for my needs, aren't they?

What might i still be doing wrong?

Thank you

zoellner commented 7 years ago

without seeing your code I can't tell what you might be doing wrong. Have you tested with different users? Have you run the code from the readme?

eyalhakim commented 7 years ago

Okay, a few things i have discovered: the quota for messages.get is 5 and not 1 https://developers.google.com/gmail/api/v1/reference/quota

However, even if i change it to 5 it doesn't solve the problem. Lowering the parallelRequests amount to 1 does work! However, it slows done the process.

i will use this configuration for now until you have a better solution

eyalhakim commented 7 years ago

@zoellner Here is my code:

`let GBS = new GmailBatchStream(auth.credentials.access_token, { parallelRequests: 1 // without this, i get the errors }); let gmail = GBS.gmail(); //ids can be e.g. 1700 long let messageIdStream = _h(ids); let results = []; let errors = 0; let processed = 0;

messageIdStream
    .map((messageId) =>
        gmail.users.messages.get({
            userId: 'me',
            id: messageId,
            format: 'metadata',
            metadataHeaders: ['To', 'From']
        })
    )
    .pipe(GBS.pipeline())
    .tap((result) => {
        processed++;
        deferred.notify(processed / ids.length * 100);
        if (result.error) {
            errors++;
            return;
        }

        let thirdParty = result.payload.headers.find((header) => !header.value.includes('eyaljhakim@gmail.com'));
        if (!thirdParty) {
            return;
        }

        let parsedAddresses = addrs.parseAddressList(thirdParty.value);

        if (!parsedAddresses) {
            return;
        }

        results.push({
            messageId: result.id,
            domain: parsedAddresses[0].domain,
            thirdParties: parsedAddresses.map((parsedAddress) => {
                return {
                    local: parsedAddress.local,
                    name: parsedAddress.name,
                }
            })
        });
    })
    .done(() => {
        console.log(errors);
        deferred.resolve(results);
    });`
zoellner commented 7 years ago

Yes, seems to be that the quota is 5 now. I'll update the readme for that.

Slowing down the process is what you need to do since the error you are getting is too many requests. Can you measure the total amount of time your code runs and divide by the number of ids to get an idea of how many requests per second (or 100s) it is really running? And how does that change with the parallelRequests parameter?

Have you tested this with different user accounts?

zoellner commented 7 years ago

I finally got a chance to dig into this issue a little deeper. I overlooked one critical detail before: You're talking about the rateLimitExceeded error, not the userRateLimitExceeded

I don't think google specifies any details about the actual number of parallel requests you can fire. So there's not much that can be done other than repeating the requests when the 429 rateLimitExceeded error happens. I think such error handling should happen outside this module. Therefore going to close this issue but feel free to continue the discussion as you see need.

I'm going to push another update for the new rate limiter to fix a bug.