newaetech / chipwhisperer

ChipWhisperer - the complete open-source toolchain for side-channel power analysis and glitching attacks
http://chipwhisperer.com
Other
1.12k stars 285 forks source link

Repo Cleanup #458

Open alex-dewar opened 1 year ago

alex-dewar commented 1 year ago

ChipWhisperer is a fairly old project at this point and, as such, the repo has accumulated a lot of files and a large history. This has a lot of negative effects:

It would therefore be beneficial if we could archive most of that history/those files and start fresh. The archive should be fairly simple; just make a new repo (maybe chipwhisperer-historical) and point a local version there.

For the new chipwhisperer repo, one option would be to start completely fresh; move all the desired files into a new repo and point that here. However, it would be nice if we could keep the history of all the non-NewAE contributions and just squash everything else to reduce space.

colinoflynn commented 1 year ago

There may be some not-horrible ways too. I tried running git-filter-repo analyze which showed close to 400MB of deleted directories (in the packed size). That with killing old branches might go pretty far. It looks like a lot of the FPGA implementation files for e.g. CW305 are included which most people don't care about too, so could save some space there removing them (or moving).

We could also consider moving the ChipWhisperer python stuff to a separate repo... a long discussed option but would also need more consideration, as may be an even more breaking change.

blob-shas-and-paths.txt directories-all-sizes.txt path-deleted-sizes.txt path-all-sizes.txt extensions-deleted-sizes.txt extensions-all-sizes.txt directories-deleted-sizes.txt

jpcrypt commented 1 year ago

Moving all the FPGA target stuff to its own repo makes sense and would give a substantial reduction.

alex-dewar commented 1 year ago

Seems like the archive import will be even easier than I thought. There's a github option when making a new repo to import another. Will have to see how this works, but hopefully it grabs all branches and stuff there currently.

EDIT: Yeah, looks like importing preserves everything, including commits/branches/etc.

alex-dewar commented 1 year ago

git-filter-repo also seems to work very well. By deleting the CW305 files and removing the history of every deleted file, I was able to get the repo size down to ~400MB. This is down from roughly 1.7GB. It may be worth trying to squash all the hardware/cw305.py history down to a single commit as well. I'd guess that would save something like 100MB

I assume that we can make similar gains on chipwhisperer-jupyter as well.

Useful link for this: https://stackoverflow.com/questions/63496368/git-how-to-remove-all-files-from-the-git-history-that-are-not-currently-prese

alex-dewar commented 1 year ago

For chipwhisperer-jupyter, it looks like there quite a bit to save, but the traces for the simulated versions of labs end up taking up a lot of space. Without the trace files, we can get the repo down <200MB.

colinoflynn commented 1 year ago

That sounds pretty good! For the traces and similar - we could move them to some external location (either another repo or even off github). Would like something stable so github might make sense still, but they could get downloaded "on demand" if you actually need them (and not by default).

Had thought of this a little before, we could have some small Python module that deals with it like e.g.,

import chipwhisperer_traces as ct

traces = ct.sca101.etc

Would have to see if there is an easy module to do this for us, but basically idea could be that when you first access it then it actually downloads them, and caches them locally. Or you can force a download of everything (if for example you are running a training and want it all cached locally) with something like ct.download()

The main advantage of a complicated download system is it can be updated to be almost anything in the future. So could be another URL or even another system (e.g., eventually using a real database or simialr).