tylermorganwall / skpr

Generates and evaluates D, I, A, Alias, E, T, G, and custom optimal designs. Supports generation and evaluation of mixture and split/split-split/N-split plot designs. Includes parametric and Monte Carlo power evaluation functions. Provides a framework to evaluate power using functions provided in other packages or written by the user.
https://tylermorganwall.github.io/skpr/
GNU General Public License v3.0
115 stars 15 forks source link

Optimality query #59

Closed tsbaguley closed 3 years ago

tsbaguley commented 4 years ago

I'm using get_optimality to try and find the D-efficiency of a fraction from a large factorial universe. It is returning "Inf" and I'm not sure how to interpret that. I've increased repeats but that hasn't shifted anything.

GeorgeMKhoury commented 4 years ago

Interesting. Can you provide code that produces this error, so we can investigate?

tsbaguley commented 4 years ago

This is the code I was using. It is a made-up example but typical of what someone might want to do in a factorial survey/vignette experiment. The aim is to show how to select a useable number of trials to estimate main effects and 2way interactions efficiently. The first example runs but I think D is implausible (close to 100) and the second example generated D = Inf. We did have some issues with server apparently so its possible that's causing a problem. I'm going to try re-running now with more repeats.

large_universe_example = expand.grid(a = as.factor(c("male", "female", "nonbinary")), b = as.factor(c("student", "professor")), c = as.factor(c("single", "married", "divorced")), d = as.factor(c("18-25", "26-30", "31-35", "36-40", "41-45", "46-50", "51-55", "56-60", "61-65", "65+")), e = as.factor(c("unemployed", "employed", "retired")), f = as.factor(c("banter", "bullying", "harassment")), g = as.factor(c("morning", "afternoon", "evening", "night")), h = as.factor(c("High", "Med", "Low")) )

head(large_universe_example) tail(large_universe_example) dim(large_universe_example)

design_lue <- gen_design(candidateset = large_universe_example, model = ~., trials = 1200, parallel = FALSE, repeats = 20) design_lue

get_attribute(design_lue, "alias.matrix") get_attribute(design_lue, "correlation.matrix") get_optimality(design_lue)

optFederovC(design_lue, nTrials=1200)

design_lue2 <- gen_design(candidateset = large_universe_example, model = ~(a+b+c+d+e+f+g+h)^2, trials = 1200, parallel = FALSE, repeats=20) design_lue2

get_attribute(design_lue2, "alias.matrix") get_attribute(design_lue2, "correlation.matrix") get_optimality(design_lue2)

GeorgeMKhoury commented 3 years ago

I can replicate. For the first model, I think the high D efficiency is not too surprising - you have 1200 runs to estimate a 24 term model. We shouldn't be getting Inf in the second case, though, I'll look more into that.

tsbaguley commented 3 years ago

Yes - that makes sense. Thank you!

GeorgeMKhoury commented 3 years ago

@tylermorganwall I just initiated a pull request to fix the infinity, just involved moving exp() from DOptimalityLog to the caller.

We might have another problem, though. Continuing with this example:

> eval_design(design_lue2)
Error in eval_design(design_lue2) : 
  Wrong number of anticipated coefficients
GeorgeMKhoury commented 3 years ago

Sorry, I should have clarified, I'm getting that error because design_lue2 is a nonsensical design - it only has a single design point repeated 1200 times. It's strange because if I make a 400 run design for the same model, it works ok.

tsbaguley commented 3 years ago

That makes sense - with small runs I can pass the design to AlgDesign and get the same D efficiency but something weird happens in the couple of large runs I've tried. It could matter because these sorts of designs do come in factorial survey experiments.

GeorgeMKhoury commented 3 years ago

I think I fixed the issue now in my pull request. The issue is that genOptimalDesign() was getting Inf when it calculated the D-optimality of the initial (random) design, and then the search algorithm couldn't proceed.

tsbaguley commented 3 years ago

Thank you - I wanted to include an example in a manuscript I'm preparing for submission. When I next time to work on the MS I'll work up a short example and see if it works. I wanted to use skpr because it is more user-friendly than other alternatives.

tylermorganwall commented 3 years ago

This was indeed fixed by @GeorgeMKhoury's commit, although this particular design will take a while to compute since it's such a massive model.

tsbaguley commented 3 years ago

Yes - I'll probably scale down my example in the ms I'm preparing.