I'm working through the book sequentially and these are my comments on chapter 3. I've also made some direct edits to the text for more minor things and writing adjustments and issued a pull request.
unclear what this means: "they may not reflect deep psychological generalizations". Perhaps change to "may not generalise broadly to different people and different situations" and also drop the sidenote.
Figure 3.2 - legend is rather small
"Regardless, however, the original telescope was simply too small to have seen whatever was there" — and can we explicitly say what the consequences of that are? (for me the small telescopes analogy is not very intuitive, compared to e.g., Bayesian accounts)
"When this literature is taken together, the chance of a significant finding in a replication study" <- only well-powered replication studies?
"With relatively few exceptions, the studies chosen for replication used short, computerized tasks that mostly would fall into the categories of social and cognitive psychology" <- that's true for RPP but I think we're also referring here to the Camerer studies which were economics and social sciences more broadly?
"They tell us almost nothing about whether the construct that the effect was meant to operationalize is in fact real!" <- Almost nothing seems a bit too harsh!
"Carney et al. (2010) reported a striking study of this phenomenon" <- not sure what was striking about it beyond the hype?
"This result is likely not definitive, however." <- seems to weak to me. Perhaps atleast: "However the validity of this result has been called into question" or I'd go for "However, there's now good reason to think the original effect was spurious". We know quite clearly that it was the result of p-hacking thanks to Carney's admission - speaking of which, I think we should make that clearer - we say that Carney thinks power posing doesn't exist but don't mention the p-hacking.
"Several commentators used Cuddy’s name as a stand-in for low-quality psychological results, likely because of her prominence and perhaps because of her gender and age as well." - for sure there was an excessive focus on the individual in this case from some random internet people; but I don't recall seeing any evidence of bad behaviour from those engaged in the substantive critique (Simmons, Gelman etc). Have I missed something? If we agree, can we make that a bit clearer, because I at the moment this reads as a deterrent to engaging in critique. We could also highlight the bravery of Carney's post and the positive reception it got.
"Communicate personally before communicating publicly." - I disagree. I don't think there should be any onus or expectation on critiquers to contact the individuals involved, there are two many power dynamics involved and creates barriers to critique. I'm happy with 'don't make it about the person' — and it follows from that that it is not necessary to contact the person. Just focus on the science.
"The idea is that progress in psychology consists of a two-step process by which candidate ideas are “screened” for publication by virtue of small, noisy experiments and then “confirmed” by large-scale replications" - not following this. Screened for publication by virtual of... doesn't seem to make sense? More importantly, we refrain from criticising this explore small, replicate big type argument, but it seems problematic to me — if you're initial studies are small, low rigour, they have low informational value either for or against a hypothesis — what's the point in doing them?
"For example, if your experiment relies on the association between doctor and nurse concepts, you would expect this experiment to fail in antiquated English-language connotations where, for example, nurse meant something more like nanny" - this seems like a strange example, maybe nurse did = nanny a long time ago in England (it doesn't now) but we're not proposing to go back into the past and replicate things!
in the depth box we start talking about heterogeneity, but I don't recall us introducing or defining this concept anywhere else
"And analytic flexibility (or “undisclosed analytic flexibility”), the clunky term we mostly favor, describes the actual practice of trying many different things and then pretending you didn’t. Critically, undisclosed analytic flexibility describes a state of affairs not a (questionable) intent, so that’s why we like it a bit better." <- the second part here, that there's not necessarily questionable intent, is inconsistent with the first part: "pretending you didn't". Perhaps a more neutral statement of the problem is "data-dependent decision making" ? (this is the term we use in the prereg chapter. We also mostly use the term "Researcher degrees of freedom" in that chapter, and perhaps we should make our terminology more consistent?). Also some people do p-hack intentionally and I don't think we should shy away from (by saying we like a more neutral way of phrasing the problem).
title of accident report box is "When I'm 64?" - perhaps something a bit more descriptive and enticing like "Analytic flexibility reveals a fountain of eternal youth" or something?
I think we could say a bit more concretely why analytic flexibility increases the likelihood of false positives; at the moment we just assert that it does. We can refer to the prereg chapter where we elaborate on it more.
"They must ground out in specific observations" - is this a baseball analogy? I don't get it!
"But a good theory can concentrate our expectations on a much smaller set of causal relationships, allowing us to make strong predictions about what factors should and shouldn’t matter to experimental outcomes." - I liked the explanation in this para, but a concrete example would really crystallise it I think
"That doesn’t necessarily mean you have to do replications all the time – that’s only critical if you think you don’t have a very replicable literature and want to check!" <- but earlier we argue that a key purpose of replication is to improve precision, so maybe a small adjustment needed to this
"There are many concerns that go into whether to replicate – including not only whether you are trying to gather evidence about a particular phenomenon, but also whether you are trying to master techniques and paradigms related to it" <- not sure what this means, is it referring to doing replications as a training exercise?
"But there’s no evidence that things have gotten worse. If anything, we are optimistic about the changes in practices that have happened in the last ten years. So in that sense, we are not sure that a crisis narrative is warranted." <- I'm less optimistic - if there's no evidence (no one's really checking replicability over time) then what are the grounds for optimism? I'm also fine with people calling it a crisis (I think everyone can decided for themselves whether to use the word crisis or not - its a pretty subjective term).
"In that sense, we tend to side with those who have named the “replication crisis” a “credibility revolution” <- I don't think these are mutually exclusive and happy with people using either/both.
in summary of above points and a constructive suggestion, perhaps we can say that meta-science has revealed serious problems with reproducibility and replicability (some have called this a replication crisis) and this has catalyzed a range of new ideas and initiatives, like preregistration, data sharing policies, Registered Reports, etc (some have called this a credibility revolution). It's still not clear how successful these reforms have been. Our book is making a small contribution towards improving science.
I'm working through the book sequentially and these are my comments on chapter 3. I've also made some direct edits to the text for more minor things and writing adjustments and issued a pull request.
unclear what this means: "they may not reflect deep psychological generalizations". Perhaps change to "may not generalise broadly to different people and different situations" and also drop the sidenote.
Figure 3.2 - legend is rather small
"Regardless, however, the original telescope was simply too small to have seen whatever was there" — and can we explicitly say what the consequences of that are? (for me the small telescopes analogy is not very intuitive, compared to e.g., Bayesian accounts)
"When this literature is taken together, the chance of a significant finding in a replication study" <- only well-powered replication studies?
"With relatively few exceptions, the studies chosen for replication used short, computerized tasks that mostly would fall into the categories of social and cognitive psychology" <- that's true for RPP but I think we're also referring here to the Camerer studies which were economics and social sciences more broadly?
"They tell us almost nothing about whether the construct that the effect was meant to operationalize is in fact real!" <- Almost nothing seems a bit too harsh!
"Carney et al. (2010) reported a striking study of this phenomenon" <- not sure what was striking about it beyond the hype?
"This result is likely not definitive, however." <- seems to weak to me. Perhaps atleast: "However the validity of this result has been called into question" or I'd go for "However, there's now good reason to think the original effect was spurious". We know quite clearly that it was the result of p-hacking thanks to Carney's admission - speaking of which, I think we should make that clearer - we say that Carney thinks power posing doesn't exist but don't mention the p-hacking.
"Several commentators used Cuddy’s name as a stand-in for low-quality psychological results, likely because of her prominence and perhaps because of her gender and age as well." - for sure there was an excessive focus on the individual in this case from some random internet people; but I don't recall seeing any evidence of bad behaviour from those engaged in the substantive critique (Simmons, Gelman etc). Have I missed something? If we agree, can we make that a bit clearer, because I at the moment this reads as a deterrent to engaging in critique. We could also highlight the bravery of Carney's post and the positive reception it got.
"Communicate personally before communicating publicly." - I disagree. I don't think there should be any onus or expectation on critiquers to contact the individuals involved, there are two many power dynamics involved and creates barriers to critique. I'm happy with 'don't make it about the person' — and it follows from that that it is not necessary to contact the person. Just focus on the science.
"The idea is that progress in psychology consists of a two-step process by which candidate ideas are “screened” for publication by virtue of small, noisy experiments and then “confirmed” by large-scale replications" - not following this. Screened for publication by virtual of... doesn't seem to make sense? More importantly, we refrain from criticising this explore small, replicate big type argument, but it seems problematic to me — if you're initial studies are small, low rigour, they have low informational value either for or against a hypothesis — what's the point in doing them?
"For example, if your experiment relies on the association between doctor and nurse concepts, you would expect this experiment to fail in antiquated English-language connotations where, for example, nurse meant something more like nanny" - this seems like a strange example, maybe nurse did = nanny a long time ago in England (it doesn't now) but we're not proposing to go back into the past and replicate things!
in the depth box we start talking about heterogeneity, but I don't recall us introducing or defining this concept anywhere else
"And analytic flexibility (or “undisclosed analytic flexibility”), the clunky term we mostly favor, describes the actual practice of trying many different things and then pretending you didn’t. Critically, undisclosed analytic flexibility describes a state of affairs not a (questionable) intent, so that’s why we like it a bit better." <- the second part here, that there's not necessarily questionable intent, is inconsistent with the first part: "pretending you didn't". Perhaps a more neutral statement of the problem is "data-dependent decision making" ? (this is the term we use in the prereg chapter. We also mostly use the term "Researcher degrees of freedom" in that chapter, and perhaps we should make our terminology more consistent?). Also some people do p-hack intentionally and I don't think we should shy away from (by saying we like a more neutral way of phrasing the problem).
title of accident report box is "When I'm 64?" - perhaps something a bit more descriptive and enticing like "Analytic flexibility reveals a fountain of eternal youth" or something?
I think we could say a bit more concretely why analytic flexibility increases the likelihood of false positives; at the moment we just assert that it does. We can refer to the prereg chapter where we elaborate on it more.
"They must ground out in specific observations" - is this a baseball analogy? I don't get it!
"But a good theory can concentrate our expectations on a much smaller set of causal relationships, allowing us to make strong predictions about what factors should and shouldn’t matter to experimental outcomes." - I liked the explanation in this para, but a concrete example would really crystallise it I think
"That doesn’t necessarily mean you have to do replications all the time – that’s only critical if you think you don’t have a very replicable literature and want to check!" <- but earlier we argue that a key purpose of replication is to improve precision, so maybe a small adjustment needed to this
"There are many concerns that go into whether to replicate – including not only whether you are trying to gather evidence about a particular phenomenon, but also whether you are trying to master techniques and paradigms related to it" <- not sure what this means, is it referring to doing replications as a training exercise?
"But there’s no evidence that things have gotten worse. If anything, we are optimistic about the changes in practices that have happened in the last ten years. So in that sense, we are not sure that a crisis narrative is warranted." <- I'm less optimistic - if there's no evidence (no one's really checking replicability over time) then what are the grounds for optimism? I'm also fine with people calling it a crisis (I think everyone can decided for themselves whether to use the word crisis or not - its a pretty subjective term).
"In that sense, we tend to side with those who have named the “replication crisis” a “credibility revolution” <- I don't think these are mutually exclusive and happy with people using either/both.
in summary of above points and a constructive suggestion, perhaps we can say that meta-science has revealed serious problems with reproducibility and replicability (some have called this a replication crisis) and this has catalyzed a range of new ideas and initiatives, like preregistration, data sharing policies, Registered Reports, etc (some have called this a credibility revolution). It's still not clear how successful these reforms have been. Our book is making a small contribution towards improving science.