Closed anhappdev closed 2 years ago
Since the pointer file is tiny (about 200 bytes) we can check the file size after download, if it's smaller than a threshold (say 1 KB) the app should show an error.
Can we check file contents? I don't think we have any files smaller than 1k right now but for example we have the imagenet_tiny groundtruth file that is pretty close to 1k. I can easily imagine adding some other groundtruth file that is even smaller.
The pointer file has a spec how it should be so yes it would be possible to check if it's a pointer file.
Pointer files are text files which MUST contain only UTF-8 characters. The first key is always version. The required keys are: version, oid, size
The logic would be:
version, oid, size
and the first text is version
I think that should be enough.
Of course there is also another solution, which is more robust in my opinion: We provide a checksum / hash for every model file and compare it with the downloaded file.
@anhappdev please move on with the second one you proposed.
This solution may not be permanent, right? We will run out of bandwidth very soon, and then, it will always mean that the app will not work. Can't we use something else... say google drive or something, which won't have file size as well as bandwidth limitations?
This solution may not be permanent, right? We will run out of bandwidth very soon, and then, it will always mean that the app will not work. Can't we use something else... say google drive or something, which won't have file size as well as bandwidth limitations?
Every service has a quota, Google Drive too, but I think with Google Drive it will be difficult to handle access management, versioning and require a new workflow.
Do we need access management for mobile_models repo? I thought its a public repository. Same we can have in google drive as well. What is the limitation in git-lfs and google drive?
Git-LFS bandwidth limit is just 1GB per month of download, which is too less for any practical use in my opinion. We would be able to run it just once i believe (per month) if we include both models and datasets.
Whereas on Google drive, there is a 750GB per day upload limit, but doesn't seem to have any download limits. https://support.google.com/a/answer/172541?hl=en
@mohitmundhragithub FYR, we agreed to pay some bucks per month to get extra GitHub LFS bandwidth, see discussion at https://github.com/mlcommons/mobile_app_open/issues/278
With Git LFS, the existed URL will redirect to the original file so that the download still work as expected. But when the bandwidth limit exceeded, it redirects to a pointer file. The app still download the file as normal but doesn't know it's a pointer file. We need to check for this case.
Since the pointer file is tiny (about 200 bytes) we can check the file size after download, if it's smaller than a threshold (say 1 KB) the app should show an error.
Example URL: https://github.com/mlcommons/mobile_models/raw/main/v2_0/SNPE/deeplabv3_htp.dlc