nhs-r-community / NHSRpopulation

API package to get postcode and Indices of Multiple Deprivation (IMD) data from the ONS by LSOA, currently for England.
https://nhs-r-community.github.io/NHSRpopulation/
Other
11 stars 1 forks source link

large size of repository because of excel file in data-raw - slows down everything #5

Closed milanwiedemann closed 7 months ago

milanwiedemann commented 3 years ago

adding the raw LSOA data (https://github.com/nhs-r-community/LSOApop/blob/89bbb4950b2f80f58a818a804a3e9045f9f3caf2/data-raw/SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx) to this repo may not have been such a good idea ...

it seems as if the entire repo is always downloaded when installing from github, this takes a lot of time and is quite inconvenient in general! I may need to delete this file and see if this helps or alternatively start all over again and add this file to .gitignore right from the start ... awrgh

stupidpupil commented 2 years ago

https://rtyley.github.io/bfg-repo-cleaner/ is very helpful if you do want to purge a file permanently from a repo's history (and includes a switch specifically targeted at removing large blobs).

ChrisBeeley commented 2 years ago

Are you okay to take a look @Lextuga007 ?

Lextuga007 commented 2 years ago

This sounds brilliant, thanks for sharing @stupidpupil. Just out of interest @tomjemmett did you use this or something else when you sorted out the large image files in the https://github.com/nhs-r-community/intro_r repo?

tomjemmett commented 2 years ago

I did it far more manually. I did look at this and saw it was java and skipped over it (didn't have JRE installed at the time and didn't want to install it just for this...).

fwiw, it may be worth setting up this hook to prevent this kind of issue (I think I'm going to start using it!)

tomjemmett commented 2 years ago

three options for hosting large files:

Lextuga007 commented 7 months ago

Can close as package has functions that link directly to sources but these are good options to consider @tomjemmett for any future sources of data.