microsoftgraph / msgraph-sdk-javascript

Microsoft Graph client library for JavaScript
https://graph.microsoft.com
MIT License
729 stars 220 forks source link

Download requests made in bulk failing #1676

Open malee1975 opened 1 month ago

malee1975 commented 1 month ago

Not sure if this is a code related issue, or a network issue. or node related issue. It is hard to pinpoint any root cause as the data received is variable and inconsistent. When requesting downloads in bulk from the Graph API using Fetch and node JS there appears to be an intermittent problem.

At first it was assumed that node and fetch were struggling with the number of requests being made for example we would run an API request to gather info on 1000 files and for each file request a download using the methods provided by the SDK.

What was initially discovered was that when 2000+ files were requested multiple downloads would fail. For example, only 700 of these files would land on disk. In circumstances where downloads were failing there appear to have been no retry attempts.

The flows were checked for event loop lag and the CPU for excessive usage. Neither were apparent as issues. What was indicated in a error response was that fetch was just failing. It was not receiving a response.

A timer was added to create a space between individual requests, this worked and did mediate well. However, it was also apparent that download requests were failing when smaller amounts of files were requested, for example between 400 and 500. It is conceivable that this should not be a problem for an API to manage.

I am just wondering if other SDK/Graph users have encountered this issue.

malee1975 commented 1 month ago

To reply to myself!

Microsoft support confirmed that the issue was throttling.

If there are too many concurrent requests the server stops responding.

The issue arose because of the use of async in Node and how it generates requests in parallel.

I solution was to use semaphore to limit the number of concurrent requests to 4.

This way during bulk operations the 429 response codes are respected.

To me this should be part of the online documentation.

async-sema