hfg-gmuend / openmoji

Open source emojis for designers, developers and everyone else!
http://openmoji.org
Creative Commons Attribution Share Alike 4.0 International
3.84k stars 213 forks source link

Very large git repository #412

Closed JeppeKlitgaard closed 7 months ago

JeppeKlitgaard commented 1 year ago

Currently both the source files, which are relatively small, compressible SVG's, and the export files are kept in the same repository. This has caused the repository size to balloon a fair bit (currently ≈ 1.2 GB). As changes that would cause all images to be regenerated are made (such as a dark background support, etc), this will increase the repository size even further as multiple copies of the binary image files will be stored in the repository history.

Might it make sense to migrate to a two-repository solution in the future (openmoji and openmoji-src), similar to what MathJax does? MathJax, MathJax-src.

It should be fairly trivial to set up automatic export into the 'publish' repository when, for example, a tag is pushed to the source repository.

This would have a few advantages:

b-g commented 1 year ago

Dear @JeppeKlitgaard, yes this would theoretical be a good idea ... however over the course of the last years, we (the maintainers) have become quite conservative in terms of technical "nice to have" sophistication vs. introducing even more complexity. The project is about designing emojis and not about git :) Hence I don't see this happen in the near future. Sorry!

However a welcome contribution by a git expert would be to reduce the repository size by getting rid of the .ai (adobe illustrator) blobs in the git history. Initial we made a cardinal mistake and used .ai files as src files, which turned out to be terrible on all levels. As these .ai files have been archived in the releases ... I don't see any reasons to keep them. Basically the todo would be to remove any .ai files which had been inside the src folder.

antonmosich commented 11 months ago

While it is possible to remove the .ai files from the git history, it would be a massive rewrite of the git history, which would require everyone who has a fork to reset their master branch to resemble the new origin. This would also require redoing the tags, otherwise the repo would not shrink in size at all. Anyone who has some work done, which is based off of a commit in the "old" history will also have to move their work onto the new history which could be rather hard. But the payoff would be pretty big: I tried it out, and the .git directory would be 367M in size, instead of the 1.3G it is right now. That means cloning the repo would only download those ~400MB and not over a gigabyte. As the working directory also has about 300 MB when checking out the current master branch, the total size would be the sizes of above + 300MB for the openmoji directory for most users. Such a history rewrite can be achieved using git filter-branch (although it displays a warning when using it and recommends using another tool, for our simple use case it will suffice):

git filter-branch --tree-filter 'find -name "*.ai" -delete' HEAD

What this does is essentially checkout every commit in the history, delete all .ai files and recommit.

As I assume most of your users/developers aren't very well versed in git, doing this rewrite in the history would need to be well communicated beforehand with elaborate explanations on what to do, to move to the new history.

b-g commented 11 months ago

Many thanks for looking into this! But ...

As I assume most of your users/developers aren't very well versed in git, doing this rewrite in the history would need to be well communicated beforehand with elaborate explanations on what to do, to move to the new history.

Yes this is exactly the problem. This could play out as a nightmare in terms of effort for us maintainers ... for very little gain in terms of filesize. Hence I personally don't see it happen. I'm just really too afraid that this could potentially be really bad for the project. Sorry!