netj / 3x

3X — a Workbench for eXecutable eXploratory eXperiments
http://netj.github.io/3x/
21 stars 4 forks source link

Basic reproducibility features #28

Open jewellsean opened 10 years ago

jewellsean commented 10 years ago

It would be incredible useful to have some additional run information recorded to ensure results are reproducible. Sumatra, does a good job at deciding which details to log.

In particular, storing the SHA-1 for input and config files would help identify modifications from the first execution to first reproduction attempt. Another feature needed for reproducibility, is storing the git head (if the executable code lives in a git repo).

netj commented 10 years ago

Thanks for your suggestions @jewellsean!

If you decide to store or symlink the code inside the 3X experiment repository (most likely, the program directory), then for every run it keeps a copy of it. So it's already possible to retrieve the exact version of the code use for individual runs, although you need to go through additional steps to find the exact what's the commit in your separate git repository. I agree storing and displaying the git commit id would be more handy, but it seems many people simply want to run code without committing (or more precisely, without committing their modifications/tweaks), or they have their own version control policy. So we decided to make 3X keep all the copies, which works independent of the underlying version control system.

Identify the best practices for reproducibility and letting 3X provide a good default using git to record as much as possible would be still very interesting and important. In relation to managing and evolving the metadata/schema for an experiment (the inputs, outputs, and the program code) I'm thinking of tightly coupling 3X with git, but it's still an open question how exactly we should do it. Please comment if you have more specific ideas or examples of how you would want to use 3X in your workflow.