hugovk / gutenberg-metadata

Metadata from Project Gutenberg
https://hugovk.github.io/gutenberg-metadata/gutenberg-metadata.json
42 stars 7 forks source link

Thoughts on `gutenberg-metadata.json` #15

Closed jaytula closed 2 years ago

jaytula commented 2 years ago

Feel free to disregard this and close this. I was just thinking and wanted to share... 😄

gutenberg-metadata.json is pretty big and if you update it often, then the repository would become pretty big quickly.

You might have already thought of this... a potential workaround is to put it in another repository with just that one file. When updating the large file, empty the history or reset to the point before the large file was committed, commit the new large json, and force-push.

hugovk commented 2 years ago

I was thinking along similar lines yesterday.

It's not too much of a problem right now, this repo doesn't get much regular use where it would be a problem, and the JSON file can be downloaded directly from the web interface, via https://hugovk.github.io/gutenberg-metadata/gutenberg-metadata.json and via Git/SVN.

However, GitHub gives a warning when pushing files over 50 MB, and the max size is 100 MB.

File size change over time:

Suggesting it would reach the maximum in 4 years or so, so something needs to be done at some point.

I think as it reaches the limit, we can use something like GitHub artifacts or packages, maybe attach it to a release. Another option could also be the Internet Archive.

jaytula commented 2 years ago

Thanks! I'll have to look into artifacts/packages for my own edification.

hugovk commented 2 years ago

Oh and another option is zipping the file. Right now that 72 MB compresses down to a relatively compact 9 MB.

jaytula commented 2 years ago

Oh and another option is zipping the file. Right now that 72 MB compresses down to a relatively compact ~10 MB.

That occurred to me too. But it slightly inconveniences by having to unzip it.

But it is a massive difference in file size would keep the status quo going for much longer than 4 years.

Initially I had no idea how big it was when downloading the json file but had it been a zip file, I probably at a glance would have hesitated to download it because it felt less ready to use and don't know right off the bat what is inside. I guess json.zip extension would be enough of a hint. Trivial concerns in the end.

I see you're a massive contributor to open source python and other stuff. Thanks for maintaining this small package.

hugovk commented 2 years ago

You're welcome!