Open jdkent opened 3 years ago
Looks good. I'd also point you to this reference, section 4.1
... this work carefully tunes the proportion of studies possessing an effect.
Other random comments
Inevitably, there will be many different parameters that will be varied, and it will be impossible to evaluate everything over a full-factorial exploration of all possible settings. Hence, for each parameter identified to be varied, be sure to identify the 'default' value the value, when varying other parameters, the parameter will take on.
Null data generation process, spatial distribution options:
choose voxels uniformly from a (gray matter) mask (essentially copy the empirical generation method).
pull randomly from a probabilistic map (Eickhoff 2016) -- For both, need to choose distribution and mean number of foci per study
create an advanced model (possibly a gaussian mixture model) to account for spatial distribution of coordinates? -- See Samartsidis et al. (2017), §4.1.
choose number of participants to simulate -- This is perhaps the most important single parameter... will need a range... do we have any emperical evidence on number of studies typically used?
choose number of foci to simulate -- See above; number must be random, and have to decide on the distribution for this randomness.
choose number of study contrasts to include in simulated meta-analysis -- Really tricky issue... most methods ignore issue of 2 or more contrasts within a given paper, and this paper suggests that it really doesn't matter... maybe skip this issue? (i.e. just assume each study contributes a single contrast).
compare empirical and analytic estimation on null data to test false positive rates -- As noted below, need to compute various FPR measures... average FPR per voxel at some uncorrected threshold (if possible at all), FWE voxelwise, FWE clusterwise.
how should we determine kernel size/how many kernel sizes should we test over? -- Another whole ball of wax... ignore or pick some small set (e.g. 3
(Agree with your listed methods to use)
What thresholds should be applied to the results? -- See inference comment above.
convert the images to coordinate datasets:
should the output be thresholded at multiple levels (0.01, 0.001)
should there be FDR/FWER corrections applied to the output? -- I would pick a set of inferential methods (e.g. uncorrected 0.01, 0.001, voxel FWER 0.05, cluster FWER 0.05) and then run them on all null and real data evaluations.
-- Agree on the choices of metrics
Should we "hold out" some of the data to see if the results (which estimator with what parameters is most like IBMA) generalize to new data? -- One of the biggest problems is that ALE is not a generative model... so given a ALE map, how would you assert that an out-fo-sample IBMA or CBMA sample is similar? -- About negative, no big deal: Evaluate similarity twice: once on the whole image, once only on where truth is positive.
@jdkent created a repository (https://github.com/neurostuff/simulate-cbma) for the analyses. I'm not sure where it stands in relation to the analysis plan in this issue, but it looks like there's a lot there.
Summary
Demonstrate the usability/stability of NiMARE's CBMA estimators and provide soft recommendations for users. work builds off:
Additional details
Next steps
Steps:
FishersStouffersHedgesSampleSizeBasedLikelihood