unfoldtoolbox / UnfoldSim.jl

Simulate EEG / ERP data with overlap, non-linear effects, multiple regression
MIT License
7 stars 4 forks source link

Simulate ground truth and calculate MSE #105

Open ReneSkukies opened 2 months ago

ReneSkukies commented 2 months ago

Ground Truth

To test the validity of a method, it's important to simulate data and test the results against the ground truth. Currently, the ground truth can be obtained via UnfoldSim.simulate_responses (in the current version, v.0.3.2, not exported). However, the way to do this is a bit clunky since simulate_responses takes Simulation as an input, although only the design is needed:

simulation = UnfoldSim.Simulation(
        SingleSubjectDesign(;
        conditions = Dict(
            :condition => ["car", "face"],
            :continuous => range(0, 5, length = 10),
        )),
        components, # Not needed/ Redundant since it's already an input to simulate_responses
        UniformOnset(; width = 30, offset = 5), # Not needed
        NoNoise() # Not needed
    )

ground_truth = UnfoldSim.simulate_responses(MersenneTwister(1),components, simulation)

Additionally, simulate_responses results in a simple Matrix, with one column per event. For coherence with Unfold.effects it might be nice to turn this into a DataFrame. However, the resulting Matrix is needed internally, so we might need a new function entirely...

MSE

Further, as a possible measure of "goodness of fit", it might be nice to have a function that calculates the mean-squared-error between a ground truth and effects from an Unfold model. This might be also rather fit into UnfoldStats however.

Anyway, an issue with that is that ground truth gives me the full ground truth with all true effects for each component. But what if I "just" control certain covariates with Unfold.effects (i.e. set them to the mean) and only want to look at the effect of one. Then (I think) I would have to:

  1. Change the components in a way that the other covariates don't influence them (but how do I set them to the mean?)
  2. Be careful that effects has the same marginalization as the original design (e.g. [0:1:5] or so)

From @jschepers

Maybe it's worth looking at how the effects function is implemented in Unfold and then do something similar for UnfoldSim.
We have to check whether it makes more sense to manipulate the modelmatrix/design matrix after creating it or two manipulate the event table accordingly.
behinger commented 2 months ago

hi! Thanks for the detailed thoughts.

Groundtruth

The issue is that we defined our simulate_component including simulation, so that you can e.g. make component dependent on onsets. To work around this, we could add a simple wrapper `simulate_responses(rng,design::AbstractDesign,components::AbstractVector{<:AbstractComponents}) = Simulation(design,components,NoOnset(),NoNoise())

Your proposal to move the Matrix to a DataFrame is more complex. We could try to use the function

function Unfold.result_to_table(
    eff::Vector{<:AbstractArray},
    events::Vector{<:DataFrame},
    times::Vector,
    eventnames::Vector,
)

maybe even something like Unfold.result_to_table([ground_truth],[generate_design(design)],[1:size(ground_truth,2)],["simulated"]) works directly; I havent tried it - note that all the brackets are necessary as in Unfold.jl we split by event-type.

Interaction with the sequence panel has to be checked for sure... but maybe a start


MSE

I don't think I understand enough of what you want to do and what the problem is. Could you write out in pseudo-math what you want? or describe it in more details?

My naive point: Why not use mean((ground_truth .- predict(::UnfoldModel).^2), why do you need effects at all?

ReneSkukies commented 2 months ago

Ground truth

The simple wrapper seems like an elegant solution. However, I encountered an additional problem: Simulate responses only gives you a Matrix of size(0:length_of_longest_compoent, n_of_design_factorial); the problem being that this means your ground truth ERP only goes from zero to the end of the longest component, which typically is inconsistent with the estimates that you want to compare it to (e.g. \tau = (-0.2, 1.0)). I worked around this by padding the ground truth to be the same length as the estimates:

using ArrayPadding

τ = [-0.1, 1]
sfreq = 100

if τ[1] < 0
    pad_before = Int64(abs(τ[1] * sfreq))
else
    pad_before = 0
end

pad_after = Int64((τ[2] * sfreq) - size(ground_truth, 1)) + 1 # I had to add +1 here to get same length, not sure why

pad(ground_truth, 0, (pad_before, 0), (pad_after, 0))

For convenience, I used ArrayPadding.jl, but there might be a more direct way that doesn't require and additional dependency.

MSE

For your naive point: In my understanding, this doesn't work due to two points

  1. Afaik the predict(::UnfoldModel) returns a predicted ("y_hat = X*b"), i.e. a continuous stream. However, the ground truth is represented as an ERP
  2. As stated above simulate response_resonse() gives you a Matrix of size(0:length_of_longest_compoent, n_of_design_factorial); meaning a GT calculation like this:
components = [p1, n1, p3]

sim = UnfoldSim.Simulation(
    SingleSubjectDesign(;
        conditions = Dict(
            :condition => ["car", "face"],
            :continuous => range(0, 5, length = 10),
        :cont2 => range(0, 4, length = 4),
        )),
        components,
        NoOnset(),
        NoNoise()
    )
gt = UnfoldSim.simulate_responses(MersenneTwister(1),components, sim)

results in a matrix of size(45, 80). Thus, I think there should be a way to control which conditions are actually calculated, since you might not always want to look at all possible combinations.

behinger commented 2 months ago
behinger commented 2 months ago
struct GroundTruthDesign
    design::AbstractDesign
    effectsdict::Dict
end

"""
copied from Effects.jl
"""
function expand_grid(design)
           colnames = tuple(keys(design)...)
           rowtab = NamedTuple{colnames}.(product(values(design)...))

           return DataFrame(vec(rowtab))
       end

UnfoldSim.size(t::GroundTruthDesign) = size(generate_events(t))

typical_value(v::Vector{<:Number}) = [mean(v)]
typical_value(v) = unique(v)

function UnfoldSim.generate_events(t::GroundTruthDesign)
    effects_dict = Dict{Any,Any}(t.effects_dict)
    current_design = generate_events(t.des)
    to_be_added = setdiff(names(current_design),keys(effects_dict))
    for tba in to_be_added
           effects_dict[tba] = typical_value(current_design[:,tba])
    end
    return expand_grid(effects_grid)
end

effects_dict = Dict{Symbol,Union{<:Number,<:String}}(:conditionA=>[0,1])
SingleSubjectDesign(...) |> x-> GroundTruthDesign(x,effects_dict)