wviechtb / metafor

A meta-analysis package for R
https://www.metafor-project.org
226 stars 52 forks source link

Include sample size in escalc/rma.mv outputs #62

Open befriendabacterium opened 2 years ago

befriendabacterium commented 2 years ago

Classification:

Feature Request

Summary

Option to include sample size in the outputs of escalc/rma.mv (cannot see an option to do this right now - if there is please could you direct me to it and close the issue?). Thanks!

wviechtb commented 2 years ago

Could you please be more specific what exactly you would like to see in the output? escalc() and rma.mv() are really fundamentally different functions, so it's not clear to me what you want to see in the context of these two functions.

befriendabacterium commented 2 years ago

Hi Wolfgang.

Thanks for the quick response. So currently metafor::escalc outputs yi and vi, but not the treatment/control group sample sizes used to calculate the effect size. This would be handy to include when append=F, for plotting etc later. For rma.mv, it would be also nice if there was an option to conserve this information on underlying sample sizes in the model object, because when using metafor::funnel for example, it will throw up an error:

Error in funnel.rma(model, yaxis = "ni", main = "after", cex = 0.5) : No sample size information stored in model object.

So it's more about conserving the information used to calculate effect sizes/build models throughout the pipeline.

P.S. I must admit I'm not 100% sure what sample size/ni is best to use for plotting (e.g. funnel plots, scatter okit with sample size represted as point size) when you have an effect size calculated from a treatment and control group. Treatment, control, or some summary (e.g. mean) of the two (because sample size between treatment and control may differ). So if there's a common summary of the two that is used, that would be great to include too.

Thanks Matt

wviechtb commented 2 years ago

Actually, escalc() does store sample size information, as an attribute of the yi element. See:

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
dat$yi

Functions like rma.uni() and rma.mv() automatically look for this info and extract it:

res <- rma.mv(yi, vi, data=dat)
res$ni

Then functions like funnel() can also make use of this info:

funnel(res, yaxis="ni")
befriendabacterium commented 2 years ago

Ah yes, of course, this rings a bell now, thanks! In binding them to dataframes in my pipeline I forgot because the 'ni' attribute was lost...would it be possible to output the 'ni' attribute as an extra column in the yi/vi dataframe, as a future feature request perhaps? Could keep it as an attribute of yi, but as an extra column it would be handy for custom plotting?

wviechtb commented 2 years ago

Not sure what kind of binding operations you are doing, but there are methods for escalc objects that handle the ni attribute properly. For example:

dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
dat$yi
dat[1:5,]$yi
rbind(dat[1,], dat[2:5,])$yi

Adding ni as an extra column wouldn't make something like funnel(res, yaxis="ni") automatically work, since it looks for the ni attribute, not some variable in the data frame. If you want to make other plots that involve some kind of sample size information, then I don't think a change to escalc() is needed. Simply include such a variable in the data frame. In fact, the sample sizes would already be part of the data frame in the first place in order to compute the effect sizes and sampling variances, so why should escalc() add them back again? Maybe you could demonstrate a particular workflow with a small reproducible example to illustrate how you think this could be useful.

befriendabacterium commented 2 years ago

Yeah, I know it's quite a slight feature change request, but in my mind, it would be nice to have the yi/vi/ni outputted as one dataframe (ni can remain an attribute of 'yi' of course), to facilitate custom (outside metafor) downstream plotting/analyses. Although the control and treatment group sample sizes are in your original dataframe, and the 'ni' sample sizes are just sum of your control and treatment group sample sizes, I think it'd be helpful to have this added. Example:

library(metafor)
dat <- dat.bcg
dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat)
#add the ni column manually
dat <- cbind(dat, ni=attr(dat$yi, "ni"))

#this
plot(yi~year, cex=log10(ni), data=dat)
#is easier/neater than
plot(yi~year, cex=log10(attr(dat$yi, "ni")), data=dat)

So if escalc could do line 6 (add the ni column) for you, I think it just makes things a bit easier/cleaner for those wanting to use 'ni' outside of metafor functions. It could just be an option in escalc.

wviechtb commented 2 years ago

I have given this some further thought, but won't be pursuing this further. I definitely wouldn't want to make this behavior the default, since there are by now hundreds of documents that would require updating showing the changed output. It's also a bit late now for changing the default output of escalc(). And again, the sample size info is already in the data frame. If one wants the total sample size in the data frame, it takes a single line of code to add it. On the other hand, allowing for an additional output variable by escalc() takes more work (for example, the var.names argument would now also require a third value, but the old behavior would need to be preserved). There are just too many implications for such a change.