p4lang / p4-spec

Apache License 2.0
175 stars 80 forks source link

Investigate whether PDF/HTML generated for all versions are in git repo history #1287

Open jafingerhut opened 2 months ago

jafingerhut commented 2 months ago

I am not sure, but I think that the way that PDF/HTML is generated for the language spec, the p4lang/p4-spec repository grows in size every time a commit is made to the repo. I believe some other p4lang spec repos, e.g. p4runtime, do auto-generate PDF/HTML on every commit, but in a way that only the most recent PDF/HTML are ever in the git repo, not the history of all revisions. I believe that binary PDF does not compress well with git across small changes in the source text.

jafingerhut commented 2 months ago

Here is a change made in 2022-Nov to the repo https://github.com/p4lang/pna recommended to me by Antonin Bas that helps keep only the latest version of auto-generated PDF and HTML docs for the PNA specification in the gh-pages branch of that repo: https://github.com/p4lang/pna/commit/a791747b3aa3ea033771fd082b6887f17ed92ddb

The p4runtime repo also uses the --amend option for this purpose.

As far as I can tell, the p4-spec repo does not use this --amend option when committing new generated HTML and PDF files to the gh-pages branch, thus it has full history of all generated versions stored in its commit history.

jafingerhut commented 2 months ago

I do not know how to calculate the precise quantitative effect of this, but as a rough kind of evidence that storage is creeping up on the p4-spec repository: