ucbrise / jarvis

Build, configure, and track workflows with Jarvis.
https://ucbrise.github.io/jarvis
Apache License 2.0
13 stars 8 forks source link

Jarvis Reproduce #15

Open rlnsanz opened 6 years ago

rlnsanz commented 6 years ago

Reproduce an experiment or trial by materializing it on a user-specified directory. The execution of a python script enables the data scientist to re-run some experiment, or some trial within the experiment.

rlnsanz commented 6 years ago
malharpatel commented 6 years ago

28504390_10208989822582112_314899711_o 28547872_10208989821342081_534545728_o 28579895_10208989821102075_973293686_o 28547370_10208989821142076_427410176_o

dcrankshaw commented 6 years ago

Glancing through those notes on the board, I noticed it says that if we re-run an experiment it does not get versioned. What's the reasoning behind that? I think we should be very careful about deciding to not version things.

rlnsanz commented 6 years ago

What we meant is that if you reproduce an experiment it does not get versioned. Re-running an experiment always gets versioned.

The argument was that if you checkout some past experiment-run to some target directory, move to that directory, and call python reproduce.py -- if the run-reproduction is a "true" one -- then the results would already have been versioned, and don't need to be versioned again.

That said, it's unlikely that the experiment-run reproduction will be a "true" one, sources of randomness could alter some result, and we may have good reasons for tracking how many times an experiment has been reproduced.

Still... what it means to version something well when you re-run an experiment vs. when you reproduce a past one are something that we wanted to call to the attention of the group, and something we could discuss in meeting.