nextcloud / integration_onedrive

🗔 Integration of Microsoft OneDrive into Nextcloud
GNU Affero General Public License v3.0
33 stars 7 forks source link

Failure after a couple gigs of downloading #47

Open makinbacon21 opened 2 months ago

makinbacon21 commented 2 months ago
[integration_onedrive] Warning: OneDrive error downloading file 012db805d345a17f6169f0bd0fdfa63adb354729 : Server error: `GET https://tvv8ja.by.files.1drv.com/y4miAECgF8YfvErIVdbP9TX8f3oDG9xo5ND2tsHQy2KwonmXhpDAf1DUEmRgj5Z6neDJb-miBEEQ3JUMvbv9ASME9i_azFmNsuhkxoAmwTzRV6a-PNKUT9aq8rofLH_XIA0Ruq2Mv5dwIceON4UC-47JH11EX6jKDXuO7K7j8wlS6FdlxjMelE8pp-rrmmKVdMd9YurH_xM5iBbjJX00sYTki8ONqxkKbgodhzUYB77Wkk` resulted in a `503 Service Unavailable` response:
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' (truncated...)
[integration_onedrive] Warning: OneDrive error downloading file 012db805d345a17f6169f0bd0fdfa63adb354729 : Server error: `GET https://tvv8ja.by.files.1drv.com/y4miAECgF8YfvErIVdbP9TX8f3oDG9xo5ND2tsHQy2KwonmXhpDAf1DUEmRgj5Z6neDJb-miBEEQ3JUMvbv9ASME9i_azFmNsuhkxoAmwTzRV6a-PNKUT9aq8rofLH_XIA0Ruq2Mv5dwIceON4UC-47JH11EX6jKDXuO7K7j8wlS6FdlxjMelE8pp-rrmmKVdMd9YurH_xM5iBbjJX00sYTki8ONqxkKbgodhzUYB77Wkk` resulted in a `503 Service Unavailable` response:
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' (truncated...)

etc.

It would appear that when it hits a 503, it kills the proc instead of retrying. Likely getting ratelimited or something.

makinbacon21 commented 2 months ago

Might have found the real issue? It limits files per "page" then you have to rereq next "page" https://stackoverflow.com/questions/39772614/microsoft-graph-maximum-number-for-list-onedrive-items#:~:text=The%20OneDrive%20API%20returns%20a,133%20items%20in%20your%20example). Could also just be ratelimiting. Might pay for a month of azure to get everything transferred once lol.

baywet commented 4 weeks ago

Facing the same problem here. We can trace the calls being made here which indeed does NOT implement any retrial. Tracing things upstream, the client gets created here and we can observe that no retrial is in place for transient issues (429, 503) Implementing a middleware handler would benefit all outgoing http requests and make them more reliable. We don't even have to implement the handler, we could simply reuse an existing one and add it to the stack

More documentation on Microsoft Graph Throttling protections

bigcat88 commented 4 weeks ago

Facing the same problem here. We can trace the calls being made here which indeed does NOT implement any retrial. Tracing things upstream, the client gets created here and we can observe that no retrial is in place for transient issues (429, 503) Implementing a middleware handler would benefit all outgoing http requests and make them more reliable. We don't even have to implement the handler, we could simply reuse an existing one and add it to the stack

More documentation on Microsoft Graph Throttling protections

This is a very good suggestion, I must say.

I would gladly review a pull request with the algorithm as in "RetryHandler.php" at the link - I like this solution quite a lot, if it does not contain any pitfalls (what to do if we waited so long for a retry request that the token expired?).

baywet commented 4 weeks ago

The retry handler will only retry for transient errors (429/503). If we get into the situation of the token expiring before the next retrial, the client will get 403/401 instead. Which means the retry handler won't retry this latest response (no chances of infinite loops), and the token expiration will be handled by the existing code.

As for the PR, do you have a preference between bringing the dependency and using the handler as is, or "copying the code"?

bigcat88 commented 4 weeks ago

The retry handler will only retry for transient errors (429/503). If we get into the situation of the token expiring before the next retrial, the client will get 403/401 instead. Which means the retry handler won't retry this latest response (no chances of infinite loops), and the token expiration will be handled by the existing code.

Good point, make sense.

As for the PR, do you have a preference between bringing the dependency and using the handler as is, or "copying the code"?

It would be preferable to avoid dependency.

Since that project is under the MIT license, we can copy the code by keeping their license in the file or adding a reference to that project in the comments in the code, as far as I know.

bigcat88 commented 3 weeks ago

If it's easier for you, we can also accept the library's dependency.