huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift
Apache License 2.0
657 stars 70 forks source link

Download tasks getting rate limited by background configuration #61

Closed ZachNagengast closed 6 months ago

ZachNagengast commented 6 months ago

Following up on @davidkoski's #59 issue I did a bit of research and may have found the underlying issue based on this section in the docs: https://developer.apple.com/documentation/foundation/url_loading_system/downloading_files_in_the_background#3038438

When the system resumes or relaunches your app, it uses a rate limiter to prevent abuse of background downloads. When your app starts a new download task while in the background, the task doesn't begin until the delay expires. The delay increases each time the system resumes or relaunches your app.

As a result, if your app starts a single background download, gets resumed when the download completes, and then starts a new download, it will greatly increase the delay. Instead, use a small number of background sessions — ideally just one — and use these sessions to start many download tasks at once. This allows the system to perform multiple downloads at once, and resume your app when they have completed.

This would explain the intermittent successes and hard-to-reproduce behavior, although this is still a hypothesis. This showed up for me when 1-2 downloads in a repo would succeed, but then it would hang up and not complete. I believe it is getting rate limited by the OS.

The reason it likely didn't show up in swift-coreml-diffusers is because that repo is downloading single zip files, and uses a static identifier that only creates one background session:

https://github.com/huggingface/swift-coreml-diffusers/blob/d041577b9f5e201baa3465bc60bc5d0a1cf7ed7f/Diffusion/Common/Downloader.swift#L32

Whereas now we are downloading an arbitrary number of separate files based on the glob search, and creating a new session for each file:

https://github.com/huggingface/swift-transformers/blob/95194e645743f9aebe9557bed045162701b70f9f/Sources/Hub/Downloader.swift#L36

I'd propose these two options that would be good to have feedback on:

  1. Use a static identifier for background URLSession identifiers
  2. Create an option to just use the default (foreground) URLSession configuration instead of the background configuration
    • Question: should this be the default for simplicity? Will changing this default break have the potential to break anyone's current usage?

I can have a PR ready with these changes, but curious what folks think about these findings cc @pcuenca @davidkoski @jkrukowski

davidkoski commented 6 months ago

AFAIK background downloading is more interesting for iOS apps that want to download things past when the user is interacting with them. If you want to download and wait for the download to finish I think a default download task would be more appropriate.

Perhaps the caller could indicate intent?

ZachNagengast commented 6 months ago

Makes sense to me, also apps need to handle a few things on their own to use the background session fully as intended, like enabling the background fetch capability and handling app suspensions, which I think would be hard to do from the swift package side. While convenient I don't think we're getting a free lunch with it and I agree it would probably be best to require the developer to enable it explicitly, with some warnings about the supporting requirements. I'll put together a PR with this in mind.

pcuenca commented 6 months ago

Just to confirm: yes, this was inherited from swift-coreml-diffusers, where a single big zip file is downloaded (per model), and the intention was to support iOS downloads in the background.

Thanks for investigating this further @ZachNagengast!

ZachNagengast commented 6 months ago

Resolved with #64