how2sign / how2sign.github.io

Project page for the How2Sign dataset
https://how2sign.github.io/
14 stars 6 forks source link

Fix dataset download issue for large files from Google Drive #21

Open GerrySant opened 6 months ago

GerrySant commented 6 months ago

This commit resolves the problem where large dataset files could not be directly downloaded due to Google Drive's virus scan warning. The updated download script now handles the confirmation step automatically, bypassing the virus scan verification and allowing seamless dataset acquisition.

In addition, Google Drive implements a traffic quota for the user IP, which limits the amount of user traffic to its servers. This causes the download of an error HTML page 'Google Drive - Quota exceeded' instead of the desired file. The new version detects if this error occurs and retries the download after half an hour until the file can be successfully downloaded.

cleong110 commented 5 months ago

Giving it a test now, it seems to be working, in that with the old download script it downloaded a series of 2.38kb files instead of actual data, whereas the new script has detected a quota error on the very first file image

cleong110 commented 5 months ago

However I can manually download the same exact file. So if it's an actual quote issue, It doesn't seem to be affecting manual downloads

cleong110 commented 5 months ago

Days later, the script is still running, and finally downloaded .z01 and .z02 and .z03

GerrySant commented 5 months ago

@cleong110 You are right, I created this solution quite quickly using brute force. The solution works but, because of the google exceeded quota issue, downloading the files can take many days.πŸ˜•

All this solution was basically to avoid the manual download of the dataset. I am very lazy with these things.πŸ˜‚

cleong110 commented 5 months ago

June 21 update, we're at .z07 now

ZechengLi19 commented 3 months ago

@cleong110 You are right, I created this solution quite quickly using brute force. The solution works but, because of the google exceeded quota issue, downloading the files can take many days.πŸ˜•

All this solution was basically to avoid the manual download of the dataset. I am very lazy with these things.πŸ˜‚

Have you downloaded the entire dataset? I also encountered the problem of google exceeded quota issueπŸ˜‚

cleong110 commented 3 months ago

I've been continually and slowly working on it, and making slow progress, so I think the solution does indeed work in a sense. I wonder if it might be worth it to reupload/mirror on, say, Zenodo?

On Wed, Jul 31, 2024 at 3:41β€―AM ζŸδΊŒζ¬‘ε…ƒηš„ι«˜ε‚ζ‘δΉƒ @.***> wrote:

@cleong110 https://github.com/cleong110 You are right, I created this solution quite quickly using brute force. The solution works but, because of the google exceeded quota issue, downloading the files can take many days.πŸ˜•

All this solution was basically to avoid the manual download of the dataset. I am very lazy with these things.πŸ˜‚

Have you downloaded the entire dataset? I also encountered the problem of google exceeded quota issueπŸ˜‚

β€” Reply to this email directly, view it on GitHub https://github.com/how2sign/how2sign.github.io/pull/21#issuecomment-2259873540, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5FSTNPCATZLT3D5O4N27BTZPCIJHAVCNFSM6AAAAABG52S5LCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJZHA3TGNJUGA . You are receiving this because you were mentioned.Message ID: @.***>