rapidsai-community / notebooks-contrib

RAPIDS Community Notebooks
Apache License 2.0
512 stars 266 forks source link

[FEA] This repo is too big. Consider using git-lfs or keeping the data outside of git. #290

Open harrism opened 4 years ago

harrism commented 4 years ago

Describe the bug This repo is 500MB and growing. Cloning is slow. Most users probably don't use all the datasets. Should consider externalizing the datasets. Could also use git-lfs (large file storage): https://git-lfs.github.com

(base) mharris@dgx02:~/rapids$ git clone git@github.com:harrism/notebooks-contrib.git
Cloning into 'notebooks-contrib'...
remote: Enumerating objects: 4547, done.
remote: Total 4547 (delta 0), reused 0 (delta 0), pack-reused 4547
Receiving objects: 100% (4547/4547), 489.83 MiB | 24.91 MiB/s, done.
Resolving deltas: 100% (2392/2392), done.
Checking out files: 100% (213/213), done.
taureandyernv commented 4 years ago

@harrism , keeping the repo size down was a problem, but we should not be at 500mb. IIRC, 300ish. we may have had an accidental push of dataset or history. If josh okays it, i will search and remove the old datasets. thanks man!

harrism commented 4 years ago

@taureandyernv Even if you remove them, they will still be there in the git history (affecting everybody who clones), unless you modify the history. This is why you should use a different system for large datasets, like git-lfs, since it replaces the files with text pointers to external storage on GitHub.