psrc / queue

Queue: the platform for managing long-running simulation models.
http://psrc.github.io/queue
2 stars 0 forks source link

Design input-tracking db/git archive/retrieval system #35

Closed billyc closed 9 years ago

billyc commented 9 years ago

In addition to logging runs, we also need to archive the inputs so we can immediately restore and rerun any run at any time.

We need to think this through; but my initial thought is that Git is actually well-suited to this and might be better than a database design.

How Git might work:

Ennazus commented 9 years ago

Thanks for writing this up. Tracking inputs and outputs is one of the most major challenges in travel modeling, usually because we can't find enough storage space to let things sit for a while. I hadn't heard of gitfat. It does seem like the file size and culture shift would be the biggest hurdles.

billyc commented 9 years ago

i'm pretty sure the method above is going to work. let's try it. some details, so i don't forget later:

we'll have to test to make sure the removed files/folders work properly and not just additions. (i don't know rsync very well)

Rsync won't bother copying files that haven't changed, and git won't store duplicates of files that haven't changed, so overall this shouldn't add much size or time to the process.

To restore a run, you can fetch the repository itself and check out a particular commit, or we can add a feature to download a .zip file of the entire model inputs folder to the dashboard. oooh yeah!

billyc commented 9 years ago

reading rsync man page, it sounds like some combination of --delete-* and --exclude= should get us what we want, which is deletion of files that have disappeared, while retaining the .git folder.

billyc commented 9 years ago

rsync -vrultz --delete --exclude '.git*' [src] [dest]

seems to work