stat157 / questionnaire

Stat 157 Questionnaire Data Wrangling
1 stars 23 forks source link

What is the meaning/purpose of reproducibility for this project? #31

Open rerock opened 10 years ago

rerock commented 10 years ago

As stated in the description of this project, the objectives are to visualize data from our questionnaires, to better understand us(Cal Fall 2013 Stat 157 students) and to make our project process reproducible by others. However, I am not sure I understand the meaning of reproducibility in this case. If we are using the same method/code to examine the same sample group again(the same data), our results should be expected to be the same right? But if we are talking about reproducibility of the project process in a different sample group, because we start the project by looking through all our samples, and because we determine our project process/methods/codes based on our samples, our results should be different if we use a different sample group, right? I am just not sure the meaning/purpose when we are talking about reproducibility in this case.

carlshan commented 10 years ago

In this situation, I believe that reproducibility refers to the fact that other parties (such as different groups) can produce the exact same analysis and visualizations that you come up with because they have access to your code.

So you're exactly right; the results should be the same.

If this was a real scientific experiment, then researchers from all over the world could read your analysis and verify it for themselves by reproducing the results through running your code in the same virtual machine setup that you specified.

aculich commented 10 years ago

@wliang88 great question! @carlshan great answer!

I would like everyone to regularly ask this kind of question about what we're doing. The answers aren't always obvious and, in fact, there is no agreed upon definition for these terms across this emerging area.

The answer that Carl gave is really a great answer that gets to the heart of this assignment: In this case if you gave someone the program you wrote along with the same data and the same environment (the virtual machine), they should be able to exactly reproduce your results.