Cannot checkout the repository

barjin commented 6 months ago

git clone / checkout gives the following error:

Error downloading object: server/static/datasets/goodbooks-10k/goodbooks_img.zip (1e41bd5): 
Smudge error: Error downloading server/static/datasets/goodbooks-10k/goodbooks_img.zip (1e41bd5bd92d60a0ca122c987ba6535d14ecc535a369dbb89c7858fdcfb64124): 

batch response: This repository is over its data quota. 
Account responsible for LFS bandwidth should purchase more data packs to restore access.

The LFS bandwidth quota has most likely been exceeded for this repository.

While this can be mitigated with the GIT_LFS_SKIP_SMUDGE=1 envvar for cloning and downloading the dataset (ml-latest.csv) separately from the link in the README, the "dynamic" image loading via the cinemagoer package is taking a bit too much time (during the actual study). The link to uschovna shared during the MFF courses is dead as well (and before, it was also subject to download count limits).

Can anything be done about this, e.g. have the images stored as URLs to IMDB / CSFD? Or perhaps provide an alternative way of doing things in the README?

pdokoupil commented 6 months ago

Thanks for reporting the issue. We recently shared the repository with two groups of students (attending the MFF courses), resulting in a significant spike in git clones that led to the exhaustion of the bandwidth quota GitHub provides for free accounts (1 GB per month).

For that reason, we had a backup plan with "Uschovna," but we also exhausted the 30-download limit there. I believe there should be a new link available, but I agree that this solution does not scale very well.

The cinemagoer/imdbpy is definitely very slow, it was only serving as a sort of fallback mechanism, but it is not suitable for doing anything meaningful, unfortunately. If I remember correctly (I was experimenting with this in the early days of this project, before it went on Github), storing image URLs and using these external URLs instead of server-local ones was faster, but still quite slow and not very reliable--the thing is that for different datasets, the URLs point to different servers of different reliability (IMDb images seem to be Amazon hosted now, so it is of a less concern) and the last thing we want is our participants to be complaining that their images are either not loading or loading too slowly because of some external service.

For now, I decided to update the README by providing a new section with possible issues and populating it with the git-lfs issue and possible mitigation. The current solution is to use GIT_LFS_SKIP_SMUDGE, as you already figured out, and download the images elsewhere. I decided to upload the data to an MFF-hosted server so that we are no longer subject to download limits. You can find the new links in the newly added README section.

barjin commented 6 months ago

Thank you for the answer - the herkules upload solves it for me.

Cheers!

pdokoupil / EasyStudy

Cannot checkout the repository #3