Exclusions, PDF Reproducibility

palday commented 5 years ago

The PDF generated by kniting the Rmd file does not yield the same PDF (in terms of statistical results and exclusion summaries) as the mb1-paper.pdf in the paper directory, presumably corresponding to the manuscript currently under review.

As far as @kliegl and I can tell, this is mostly due to the full exclusion code not being present in the repository. For example, there is reference to a 05_exploratory_analysis.Rmd that performs additional exclusions. The existence of an 05_*.Rmd file also suggests that there they may also be a 04_*.Rmd file(s) not present.

I understand the reasons why this information may not be shared at this point, but I would propose that this current limitation should be made transparent to potential users.

For example, including the PDF generated from the public repository as well as the actual dataframe that was used in generating the paper would allow for checking reproducibility of results. Then constructive feedback about additional need for recoding could be provided. Currently, it is unclear whether these problems (such as in #5) are already taken care of.

Related to #5 but less of a coding error than a complete omission.

mcfrank commented 5 years ago

Thanks - there are probably some package versioning issues coming up.

I am not sure that this is an exclusions issue though. The note you refer to is an out-of-date reference to previous exploratory work that got ported into the main paper...

On Tue, Jun 4, 2019 at 5:52 AM Phillip Alday notifications@github.com wrote:

The PDF generated by kniting the Rmd file does not yield the same PDF (in terms of statistical results and exclusion summaries) as the mb1-paper.pdf in the paper directory, presumably corresponding to the manuscript currently under review.

As far as @kliegl https://github.com/kliegl and I can tell, this is mostly due to the full exclusion code not being present in the repository. For example, there is reference to a 05_exploratory_analysis.Rmd https://github.com/manybabies/mb1-analysis-public/blob/92f1a863e6322eb87e9fcb89944c1052e786d558/03_exclusion.Rmd#L28 that performs additional exclusions. The existence of an 05*.Rmd file also suggests that there they may also be a 04*.Rmd file(s) not present.

I understand the reasons why this information may not be shared at this point, but I would propose that this current limitation should be made transparent to potential users.

For example, including the PDF generated from the public repository as well as the actual dataframe that was used in generating the paper would allow for checking reproducibility of results. Then constructive feedback about additional need for recoding could be provided. Currently, it is unclear whether these problems (such as in #5 https://github.com/manybabies/mb1-analysis-public/issues/5) are already taken care of.

Related to #5 https://github.com/manybabies/mb1-analysis-public/issues/5 but less of a coding error than a complete omission.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/manybabies/mb1-analysis-public/issues/8?email_source=notifications&email_token=AAI25F6Z4T7H3HZKDBO4IP3PYZQPBA5CNFSM4HS4AU52YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXQUTSA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI25FZNWVP7FSXPC35MKETPYZQPBANCNFSM4HS4AU5Q .

kliegl commented 5 years ago

Attached a script that includes corrections mentioned by @palday and @dmbates. I also found a few issues, e.g., relating to the order of trials which destroyed the nested structure of the data. Of course, you may have found the problems already or I may have missed something. Feedback would be appreciated.

Obviously, a correct specification of the data structure is necessary for comparisons of parsimonious mixed models and Bayesian models. Manybabies_cleanup.Rmd.txt

mcfrank commented 5 years ago

Thanks again for these comments. I'm keeping track of my work addressing them:

[x] fix N/A coding
[x] respecify subid as unique
[x] recoding
[x] data error corrections
[x] randomization/counterbalancing

mcfrank commented 5 years ago

@kliegl @palday As I dive into this again, first, I should say that your work here is extraordinarily helpful. Thanks for your deep engagement with the data. I am ashamed that you caught so many issues deserving of cleanup.

Second, I am noticing that the PDF reproducibility issue appears to be that exclusions.Rmd was not available, but judging by repo history it was present. Maybe there was some path issue?

kliegl commented 5 years ago

@mcfrank @palday It's been a long time, but I do have a file exclusion.Rmd. However, the critical R chunk with the title {r child = "paper/exclusions.Rmd"} in the version I found was completely empty. Not sure how to reproduce this here. I also have a the file with the same name in the paper subdirectory. (I can send per email if you need it.). I am not sure it is worth it to retrace the history. Is it important for you to reproduce these problems?

Thank you for your acknowledgement.

mcfrank commented 5 years ago

@kliegl that chunk is meant to be empty - when the paper is compiled, it runs the child code (that file in the directory), which should reproduce the exclusions and generate the correct text.

The reason for this is that we need to have summary stats for the exclusions in the paper earlier than we actually report the exclusions. So they need to be done twice, once to create the summary at the top and once to create the actual exclusions text. I use the same code for this (in an extracted chunk) so as to avoid any divergence between the two counts.

mcfrank commented 5 years ago

I should say, there is no need for you to reconstruct everything. I think I have enough information.

mcfrank commented 5 years ago

ok, for posterity:

we will use unpaired items (e.g., all sixteen item codes)
I note dummy coding in the ms.
corrected the one-off data issues

manybabies / mb1-analysis-public

Exclusions, PDF Reproducibility #8