mlcommons / mobile_app_open

Mobile App Open
https://mlcommons.org/en/groups/inference-mobile/
Apache License 2.0
43 stars 21 forks source link

Handling of case when Git LFS bandwidth limit exceeded #295

Closed anhappdev closed 2 years ago

anhappdev commented 2 years ago

With Git LFS, the existed URL will redirect to the original file so that the download still work as expected. But when the bandwidth limit exceeded, it redirects to a pointer file. The app still download the file as normal but doesn't know it's a pointer file. We need to check for this case.

Since the pointer file is tiny (about 200 bytes) we can check the file size after download, if it's smaller than a threshold (say 1 KB) the app should show an error.

Example URL: https://github.com/mlcommons/mobile_models/raw/main/v2_0/SNPE/deeplabv3_htp.dlc

d-uzlov commented 2 years ago

Since the pointer file is tiny (about 200 bytes) we can check the file size after download, if it's smaller than a threshold (say 1 KB) the app should show an error.

Can we check file contents? I don't think we have any files smaller than 1k right now but for example we have the imagenet_tiny groundtruth file that is pretty close to 1k. I can easily imagine adding some other groundtruth file that is even smaller.

anhappdev commented 2 years ago

The pointer file has a spec how it should be so yes it would be possible to check if it's a pointer file.

Pointer files are text files which MUST contain only UTF-8 characters. The first key is always version. The required keys are: version, oid, size

The logic would be:

  1. Check if the file size is smaller than a threshold (1 KB)
  2. If yes, check if the content is text and have the text version, oid, size and the first text is version

I think that should be enough.


Of course there is also another solution, which is more robust in my opinion: We provide a checksum / hash for every model file and compare it with the downloaded file.

freedomtan commented 2 years ago

@anhappdev please move on with the second one you proposed.

mohitmundhragithub commented 2 years ago

This solution may not be permanent, right? We will run out of bandwidth very soon, and then, it will always mean that the app will not work. Can't we use something else... say google drive or something, which won't have file size as well as bandwidth limitations?

anhappdev commented 2 years ago

This solution may not be permanent, right? We will run out of bandwidth very soon, and then, it will always mean that the app will not work. Can't we use something else... say google drive or something, which won't have file size as well as bandwidth limitations?

Every service has a quota, Google Drive too, but I think with Google Drive it will be difficult to handle access management, versioning and require a new workflow.

mohitmundhragithub commented 2 years ago

Do we need access management for mobile_models repo? I thought its a public repository. Same we can have in google drive as well. What is the limitation in git-lfs and google drive?

mohitmundhragithub commented 2 years ago

https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage

Git-LFS bandwidth limit is just 1GB per month of download, which is too less for any practical use in my opinion. We would be able to run it just once i believe (per month) if we include both models and datasets.

Whereas on Google drive, there is a 750GB per day upload limit, but doesn't seem to have any download limits. https://support.google.com/a/answer/172541?hl=en

freedomtan commented 2 years ago

@mohitmundhragithub FYR, we agreed to pay some bucks per month to get extra GitHub LFS bandwidth, see discussion at https://github.com/mlcommons/mobile_app_open/issues/278