Open alex-dewar opened 1 year ago
There may be some not-horrible ways too. I tried running git-filter-repo analyze which showed close to 400MB of deleted directories (in the packed size). That with killing old branches might go pretty far. It looks like a lot of the FPGA implementation files for e.g. CW305 are included which most people don't care about too, so could save some space there removing them (or moving).
We could also consider moving the ChipWhisperer python stuff to a separate repo... a long discussed option but would also need more consideration, as may be an even more breaking change.
blob-shas-and-paths.txt directories-all-sizes.txt path-deleted-sizes.txt path-all-sizes.txt extensions-deleted-sizes.txt extensions-all-sizes.txt directories-deleted-sizes.txt
Moving all the FPGA target stuff to its own repo makes sense and would give a substantial reduction.
Seems like the archive import will be even easier than I thought. There's a github option when making a new repo to import another. Will have to see how this works, but hopefully it grabs all branches and stuff there currently.
EDIT: Yeah, looks like importing preserves everything, including commits/branches/etc.
git-filter-repo
also seems to work very well. By deleting the CW305 files and removing the history of every deleted file, I was able to get the repo size down to ~400MB. This is down from roughly 1.7GB. It may be worth trying to squash all the hardware/cw305.py
history down to a single commit as well. I'd guess that would save something like 100MB
I assume that we can make similar gains on chipwhisperer-jupyter
as well.
Useful link for this: https://stackoverflow.com/questions/63496368/git-how-to-remove-all-files-from-the-git-history-that-are-not-currently-prese
For chipwhisperer-jupyter, it looks like there quite a bit to save, but the traces for the simulated versions of labs end up taking up a lot of space. Without the trace files, we can get the repo down <200MB.
That sounds pretty good! For the traces and similar - we could move them to some external location (either another repo or even off github). Would like something stable so github might make sense still, but they could get downloaded "on demand" if you actually need them (and not by default).
Had thought of this a little before, we could have some small Python module that deals with it like e.g.,
import chipwhisperer_traces as ct
traces = ct.sca101.etc
Would have to see if there is an easy module to do this for us, but basically idea could be that when you first access it then it actually downloads them, and caches them locally. Or you can force a download of everything (if for example you are running a training and want it all cached locally) with something like ct.download()
The main advantage of a complicated download system is it can be updated to be almost anything in the future. So could be another URL or even another system (e.g., eventually using a real database or simialr).
ChipWhisperer is a fairly old project at this point and, as such, the repo has accumulated a lot of files and a large history. This has a lot of negative effects:
hardware/victims/firmware/*
could be condensed into justtarget_firmware
, for exampleIt would therefore be beneficial if we could archive most of that history/those files and start fresh. The archive should be fairly simple; just make a new repo (maybe
chipwhisperer-historical
) and point a local version there.For the new
chipwhisperer
repo, one option would be to start completely fresh; move all the desired files into a new repo and point that here. However, it would be nice if we could keep the history of all the non-NewAE contributions and just squash everything else to reduce space.