Closed billyc closed 9 years ago
Thanks for writing this up. Tracking inputs and outputs is one of the most major challenges in travel modeling, usually because we can't find enough storage space to let things sit for a while. I hadn't heard of gitfat. It does seem like the file size and culture shift would be the biggest hurdles.
i'm pretty sure the method above is going to work. let's try it. some details, so i don't forget later:
we'll have to test to make sure the removed files/folders work properly and not just additions. (i don't know rsync very well)
Rsync won't bother copying files that haven't changed, and git won't store duplicates of files that haven't changed, so overall this shouldn't add much size or time to the process.
To restore a run, you can fetch the repository itself and check out a particular commit, or we can add a feature to download a .zip file of the entire model inputs folder to the dashboard. oooh yeah!
reading rsync man page, it sounds like some combination of --delete-* and --exclude= should get us what we want, which is deletion of files that have disappeared, while retaining the .git folder.
rsync -vrultz --delete --exclude '.git*' [src] [dest]
seems to work
In addition to logging runs, we also need to archive the inputs so we can immediately restore and rerun any run at any time.
We need to think this through; but my initial thought is that Git is actually well-suited to this and might be better than a database design.
How Git might work:
How would a database approach work?