Open maxim-uvarov opened 1 month ago
Sounds good to me. I don't know anything about git-lfs. Does it change the way you interact with the repo?
@fdncred Here is the project's site: https://git-lfs.com
Users who have cloned the 'nu_scripts' repository without installing 'git-lfs' will see text files in place of binary files, like this:
version https://hawser.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
(ending \n)
For those who have it installed, nothing will change.
On the GitHub site, images in markdown documents will be displayed as usual, and no changes are needed.
Here are instructions on how to move current files to 'git-lfs': https://docs.github.com/en/repositories/working-with-files/managing-large-files/moving-a-file-in-your-repository-to-git-large-file-storage
In short we will need to use 'filter-repo' to rewrite history and remove binary files. But before that we will need to accept or reject the current PRs
I've done it only once before, so I will review all the steps and let you know what's needed. I will rehearse everything on my clones to make sure that everything will go well and show you results.
Does that sound okay?
What happens when someone doesn't have git lfs installed after we've made this change?
@fdncred:
What happens when someone doesn't have git lfs installed after we've made this change?
In place of example_image.png
, they will see a text file with content like:
version https://hawser.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
(ending \n)
Those, who have git-lfs - they will see regular png image.
This operation has benefits and considerations that I propose to discuss and communicate with the community.
However, I'm still gathering information, and these are just intermediate results.
I have tested all the operations, and the results can be found here:
https://github.com/maxim-uvarov/nu_scripts_reduced_size
# Run git-filter-repo
python3 /Users/user/.local/bin/git-filter-repo --path-glob '**/*.png' --invert-paths --force
# Initialize Git LFS
git lfs install
git lfs track "*.png"
# From the unmodified repository directory
let original_repo_pngs = glob **/*.png | wrap orig | insert curr {|i| $i.orig | path relative-to (pwd)}
# Copy PNG images to the new repository
$original_repo_pngs | each {|i| cp $i.orig $i.curr}
Benefits:
Considerations and Caveats:
Merged Pull Requests:
Open Pull Requests:
Commit References:
Forks:
I've just found out that bandwidth from 'git lfs' is charged, and only 1 GB is free.
Plus, it has other downsides like losing signatures from signed commits, and PR history will point to the wrong commits.
Even though it would be much tidier to have a smaller repo size, the downsides are quite noticeable. I will stop here.
Though for new binary objects, it would be better to use 'git lfs' to avoid increasing the repo size.
Another thought is to put the screenshots in another repo and have a submodule in this repo. I think then you could clone and only get the scripts unless you wanted both. I don't really know how all that works though.
I like this variant, and it is feasible to do with the workflow that I described.
But again, it will also involve rewriting history to reduce the size of this repo, which might be a bit painful for current contributors and their forks.
I think I should ask the community when I have the energy for that.
But again, it will also involve rewriting history to reduce the size of this repo, which might be a bit painful for current contributors and their forks.
Personally I think that's a hit we're ultimately going to have to take, so probably the sooner the better.
I think I'd recommend keeping the issue open, since it is an issue, even if we don't have a full solution just yet.
Here is the rationale and the actions that I propose.
# Let's check the size of the `.git` folder
> du .git/ | get 0.physical
236.2 MB
let $a = git verify-pack -v .git/objects/pack/*.idx | lines | split column -r ' +' | rename SHA-1 type size size-in-packfile offset-in-packfile depth base-SHA-1 | where type == 'blob' | select SHA-1 size | into filesize size
let $b = git rev-list --objects --all | lines | split column -r ' +' | rename SHA-1 filename
let $sizes = $a | join $b SHA-1 | sort-by size -r
# Let's examine the largest files
> $sizes | first 5 | print
╭─#─┬──────────────────SHA-1───────────────────┬───size───┬──────────────────────filename───────────────────────╮
│ 0 │ 87cd314f73c3bd004ea52b060d8a8d6bf40fcbe7 │ 1.3 MB │ custom-completions/auto-generate/completions/mvn.nu │
│ 1 │ 09e461721303572ea79c390feec763c498b68335 │ 369.0 KB │ assets/gh-emoji.json │
│ 2 │ c2dc98f8c9aa296affb97d2dbeb84f2c6dc09adf │ 241.6 KB │ themes/screenshots/github-light-colorblind.png │
│ 3 │ 9a258215d6435fac43dbe125b8cf2d77a6cc5b82 │ 241.3 KB │ themes/screenshots/github-light-default.png │
│ 4 │ e8557e9b8cbb79b0705792cb316e46bfd98c5559 │ 240.6 KB │ themes/screenshots/github-light.png │
╰─#─┴──────────────────SHA-1───────────────────┴───size───┴──────────────────────filename───────────────────────╯
# Notice that `custom-completions/auto-generate/completions/mvn.nu` seems broken and therefore useless. So I propose to delete it as well.
Let's check how much space non-'.png' files occupy
> $sizes | where filename !~ '.*\.png' | get size | math sum | print
7.0 MB
# Let's check the size of .png files
> $sizes | where filename =~ '\.png' | get size | math sum | print
253.4 MB
# Now we execute the command that will remove broken completions in `mvn.nu` and all binary `.png` files
python3 /Users/user/.local/bin/git-filter-repo --path custom-completions/auto-generate/completions/mvn.nu --path-glob '**/*.png' --invert-paths
> du .git/ | get 0.physical
4.4 MB
# The .git folder size was reduced from 236.2 MB to 4.4 MB.
# And now I push the results to my home repo
> git remote add mu https://github.com/maxim-uvarov/nu_scripts_reduced_size
> git push mu main --force
Is there a way to render the themes as text/html/markdown/ansi so we could store that instead of images?
Possibly? I'm not a CSS wiz, so I'm not sure how difficult it would be
As for Markdown, we'd have to find a library that supports RGB colors through ANSI. I'm not sure whether Shiki does or not. The one used by Discord, for instance, only supports color names. I found that out the hard way when I tried to copy ANSI output from Nushell to Discord only to find out that my theme was the problem. I had to switch to nu -n
to get the default color_config
without RGB codes.
Some additional thoughts on this.
I've been playing with NUPM and other options for installing packages, and nu_scripts
is problematic. Users will typically clone nu_scripts
in order to install not just themes, but also stdlib-candidate/rfc
. 250mb to install stdlib-candidate
is a bit extreme. Multiply that by hundreds (or hopefully, eventually, thousands) of users and we're talking some serious bandwidth and storage consumption.
It's also slow. What should be a split-second download is ~20 seconds on a 1gbps link.
Even at 250mb, I think we need to fix it. It may not sound like much for a repo, but this is also a package that we want Nushell users (not just developers) to be able to install.
There are two stages to this:
nu_scripts
(or pick a new name) repo. But if we had to bite the bullet on a scripting-rewrite, I'd be okay with that as well.We'll lose all stars if we start a new repo. Not really interested in that.
Also, there is some discussion on the topic in the general channel discord
and in this thread
@NotTheDr01ds:
perhaps I'm naive and oversimplifying here, but it seems to me that renaming and archiving the existing repo would allow folks to keep commit history, PR history, etc.
I don't see much problem with PRs and commits history. This is history of commits of my fork, where I have run git lfs-script
. As all the commit messages are preserved - it's not a problem to identify the PR and see all its discussion and comments if there were any.
For example in this commit: https://github.com/maxim-uvarov/nu_scripts_reduced_size/commit/1a5fac0050660f0779c1651bde336f7525e84a70 we see the number of the PR. Further, with this number we can find a PR and just go through all the comments there. Links to hashes will continue to work. The only thing - the opened pages with commits will tell that those commits don't belong to the current repository:
When you delete a public repository, the oldest, active public fork is chosen to be the new upstream repository. All other repositories are forked off of this new upstream and subsequent pull requests go to this new upstream repository. source
Rewriting the history is supposed to be done by a trusted member of the core team.
This means that the only things that will change are:
.git/objects
.Messages and authors of commits, along with their dates, will remain unchanged.
The consequences don’t seem too bad to me because:
We can have a backup clone of the repo that won’t sync anymore. All the hashes and commits there will remain unchanged. We can write some markdown explaining what happened and add a link to the backup clone. And we can commit this markdown immediately after the rewrite.
Additionally, we can add a comment to all completed PRs on GitHub, explaining where to find the history of changes if needed (even though all the links to hashes will still work, GitHub will indicate that the commits no longer belong to the repo).
So from my point of view, we won’t harm the repository’s archeology much.
The most problematic issue, as I see it, involves current unmerged PRs, forks that will need to be resynced or rebased, and the related inconveniences.
Probably worth adding this message from @devyn on Discord in here for the context/history:
I agree with you but just so you know history rewriting (of the sort that LFS does) doesn't mess up anything with commit metadata like date, author, description, etc. - it modifies the contents and therefore you end up with different commit SHA and also requires people to force their local branches to it, which is not ideal, but you don't really lose history
I guess commit signatures are also lost, since you can't forge those (obviously) that's about it
Related: PoC to svg
command in Nushell -- Getting theme previews into SVG would be a pretty big space savings on the artifacts (regardless of where we store them):
Was generated using pure-Nushell with caveats:
[[ col-names val1 val2 ];
[foo 1 4 ]
[bar 5 5 ]
[baz 7 10 ]
[cat -12 100 ]
] | to svg
Blink and strike-through attributes were added by hand (for now)
Colors must be RGB (not named) at the moment (for now)
it's progress 🚀
here's a svg i created using ansi2
crate. it's not perfect but interesting. it's way too big too, needs optimization, e.g. each character has tags which bloats it.
My current progress on to svg
:
The files below are almost entirely generated with to svg
. The only modification I made manually was setting the background rect color and changing the width and height. It's currently handling:
It uses a much more optimized layout rather than specifying the exact placement of characters and text like ansi2
above. Each line is simply a tspan
offset (dy
) from the previous line.
That captures 95% of everything I'm throwing at it so far. TODO:
readme.md
in the example above, , but I at least know a way to do it that is far more efficient than the ansi2
crate. Just need to implement.to xml
ls | to svg | save ...
eza -l | to svg | save ...
ls | lolcat ...
Even though this obviously has to wrap every character (since every one is a different color), it's still half the size of the ansi2
version.
That's very cool! Can't wait to see your script.
Now the repo size seems huge to me:
apparently because images are stored here.
Can we reduce the size of the repo by storing images using
git-lfs
?I can do all the preparations and transfer the results to contributors with push rights.