jensroes / nonadjacent-sequence-learning

Learning nonadjacent sequences
Apache License 2.0
0 stars 0 forks source link

Results #8

Open jensroes opened 1 year ago

jensroes commented 1 year ago

I've still got to update the coefficients table and the pooled analysis but I think it's worth talking about the results and next steps. The analysis is living here at the moment: https://rpubs.com/jensroes/979417

I've decided that it might be better to keep in the cumulative number of target looks analysis but but for every trial only count fixations > 100 msecs as successful anticipations of the target. The threshold is debatable but rules out some observations that were too short to be considered successful and makes the results easier interpretable.

The main result is that we found learning for adjacent dependencies but not for nonadjacent dependencies when dependencies were presented in different experimental blocks. Learning effects seem to onset very early. When adjacent and nonadjacent dependencies are presented within the same block, the learning effect is reduced in size. In other words, even though participants did not successfully anticipate nonadjacent dependencies, the presence of nonadjacent dependencies impacts on the participants ability to learn adjacent dependencies such that the learning effect takes longer to manifest (i.e. is small in size after the same number of trials).

@Lai-Sang, what are we doing next :)

Mark-Torrance commented 1 year ago

This looks pretty clear. I like the binary did / did not fixate measure of learning on each occurrence.

A minor issue, but adding just a quadratic effect means that we allow estimated cumulative number of hits to decrease as well as increase. I'm prepared to bet that a reviewer will pick up on this. It's not "wrong" but it's confusing. The only two solutions that I can think of are to stick with just a linear effect (for a-priori reasons) - though I note how much fit is improved by adding $x^2$, or to add $x^3$.

jensroes commented 1 year ago

I was worried about that before. The problem isn't the quadratic component but the linear component. For reasons that have to do with how brms is parametrising the models I can put a lower bound on individual coefficients, otherwise I could say that the linear component must be positive. The quadratic component is quite likely to be negative which would mean that learning is asymptoting out (participant is getting tired).

There are two things I know I could try. I could model the growth as a Weibull model or log the predictor. See https://www.magesblog.com/post/2015-11-03-loss-developments-via-growth-curves-and/ https://www.magesblog.com/post/2015-10-27-non-linear-growth-curves-with-stan/

I'm sure there are other ways to model cumulative functions that cannot decrease. I'll have a look.

Mark-Torrance commented 1 year ago

The problem isn't the quadratic component but the linear component.

I see what you mean. Just plugging in the time parameters from the first experiment and standardising time gives a nice asymptotic curve. So BRMS must be under-weighting the linear component somehow?

t <- scale(1:40)
b = 103.73
b1 = -33.32
y <- b*t + b1*t^2

I also don't understand how you get positive acceleration at the start of the curve. That shouldn't be possible with a linear + quadratic function, right?

jensroes commented 1 year ago

I don't follow :)

I also don't understand how you get positive acceleration at the start of the curve. That shouldn't be possible with a linear + quadratic function, right?

Why is that?

Mark-Torrance commented 1 year ago

Ok, I didn't say what I meant. What I meant was that I don't think you can have both acceleration and deceleration in the same curve from $bx + b_1x^2$. You can have one of these but not both? To get an S shaped curve you need $x^3$ don't you? I might very possibly be wrong.

jensroes commented 1 year ago

Still don't understand why you can't have acceleration and deceleration? Oh one think you may have missed is that the data are modelled with a log link; a linear function log linked is already quadratic, and thus a quadratic function is cubic. I've also fitted cubic and quartic functions which show some improvement but the curves don't really seem to make much sense.

Mark-Torrance commented 1 year ago

Still don't understand why you can't have acceleration and deceleration?

Because a quadratic (plus linear) function just give curve in one direction. But this...

Oh one think you may have missed is that the data are modelled with a log link

explains it. I'm not sure how important this is - particularly if the models take ages to run - but we could have readers confused and / or arguing that if priors allow a decrease in y then they are wrong.

jensroes commented 1 year ago

I agree that the priors should rule out negative values for the linear component but brms doesn't allow that. I'll try other functions (non-linear models).

Generally about priors, if you have a sensible amount of data and the priors are relatively week, the data will overcome the prior. That negative values for the linear component are allows doesn't make the model wronger, it is just an implausible assumption. I do think it's worth to get rid of this though, because it is annoying me too.

The models don't take ages (~ 18 hrs), I'm just running a lot of them. The only model that does take long (a week) is the pooled analysis across all experiments.

Mark-Torrance commented 1 year ago

This was a sidetrack, though. @Lai-Sang do we have a paper here?

jensroes commented 1 year ago

@Lai-Sang have you had a chance to look at this? I'll be in on Fridays this term if you want to chat in person.

jensroes commented 1 year ago

@Lai-Sang and @Mark-Torrance

I've updated the results: https://rpubs.com/jensroes/979417

@Mark-Torrance there is something I don't understand and maybe you can help me. Table 2.3 (I don't think we need to include it in the paper) shows main effects for nonadjacent dependencies (compared to the baseline). In the figures underneath there is no trace of such a main effect (certainly not in experiment 2, maybe in experiment 1). I'm not quite sure what to do with this. Interpreting model coefficients for growthcurve models is not the right strategy here, I think, but I'd like to understand this nonetheless.

Table 2.4 still needs updating but it's less important I think. It shows do learning difference between Experiments 1 and 2 but a different between Experiments 1 and 2 compared to 3.

Mark-Torrance commented 1 year ago

Is see what you mean. The issue is really just with Experiment 2. I can see the main effects in 1 and 3.

What are the posterior distribution (ignoring time) for baseline, adjacent and non-adjacent?

jensroes commented 1 year ago

What are the posterior distribution (ignoring time) for baseline, adjacent and non-adjacent?

What do you mean?

Mark-Torrance commented 1 year ago

Good question. If adjacent and non-adjacent are treatment coded then I just mean posterior densitys for intercept, intercept+adjacent, and intercept+non-adjacent. I can never get my head around sum coding (without seeing the contrasts).

jensroes commented 1 year ago

Ah I see. Sum coding a categorical prediction is the same as when you centre (standardise) a continuous predictor. Figure 2.2 should be what you mean (https://rpubs.com/jensroes/979417) but I'll reproduce it manually to be sure.

Mark-Torrance commented 1 year ago

Ah, sorry, didn't scroll down far enough. Something isn't right somewhere, then?

Two questions:

  1. What actually is baseline. For sequence a,b,c, for adjacent dependencies it's a, but for non-adjacent it's a and c?
  2. When you sum code, the intercept is that grand mean, correct? In which case how do you get baseline? (I should just look at your code, really).
jensroes commented 1 year ago

Just recreated the plot for the main effects of dependency manually. They look exactly the same as in the rpubs document.

When you sum code, the intercept is that grand mean, correct? In which case how do you get baseline? (I should just look at your code, really).

Yes the intercept is the grand mean.

Not sure if it helps but this is how I'd get the posteriors for each condition without time with the relevant contrasts:

# Sum contrasts look like this
#             adjacent nonadjacent
# adjacent         0.5         0.0
# nonadjacent      0.0         0.5
# baseline        -0.5        -0.5

ps1 <- posterior_samples(m1, pars = "^b_") 
ps2 <- posterior_samples(m2, pars = "^b_")
ps3 <- posterior_samples(m3, pars = "^b_")

bind_rows(ps1, ps2, ps3, .id = "exp") %>% 
  as_tibble() %>% 
  select(-contains("poly")) %>% 
  transmute(exp = exp,
            adjacent = b_Intercept + b_dependencyadjacent * .5,
            nonadjacent = b_Intercept + b_dependencynonadjacent * .5,
            baseline = b_Intercept + b_dependencyadjacent * -.5 + b_dependencynonadjacent * -.5)

So I could just calculate the differences between the conditions (adjacent, nonadjacent, baseline) manually and we then just ignore the coefficients returned by the model which clearly doesn't show what we think it shows.

What actually is baseline. For sequence a,b,c, for adjacent dependencies it's a, but for non-adjacent it's a and c?

I think this point isn't related to the problem. The problem is that the estimates for the model coefficients do not match the visualisations although both are generated by the same model. I don't think the model is the problem, I think the problem is me :)

The baseline item from adjacent sequences is c, cause the dependency is a - b. For nonadjacent sequences it is b cause the dependency is a - c. In the code this is

dependency = case_when(trans == 1 & dependency == "adjacent" ~ "adjacent",
                              trans == 2 & dependency == "nonadjacent" ~ "nonadjacent",
                              trans == 1 & dependency == "nonadjacent" ~ "baseline",
                              trans == 2 & dependency == "adjacent" ~ "baseline"),
jensroes commented 1 year ago

So I could just calculate the differences between the conditions (adjacent, nonadjacent, baseline) manually and we then just ignore the coefficients returned by the model which clearly doesn't show what we think it shows.

I just did that. See table 2.4: https://rpubs.com/jensroes/979417

Mark-Torrance commented 1 year ago
# Sum contrasts look like this
#             adjacent nonadjacent
# adjacent         0.5         0.0
# nonadjacent      0.0         0.5
# baseline        -0.5        -0.5

Good. I'm understanding now / remember now.

Mark-Torrance commented 1 year ago

Can you explain the difference between Figure 2.2 and the new Table 2.4 - these are both taken from the posteriors of the same model?

jensroes commented 1 year ago

Can you explain the difference between Figure 2.2 and the new Table 2.4 - these are both taken from the posteriors of the same model?

Both are from the posterior of the same model. The figure is showing the estimated cell means and the Table shows the differences between each dependency type and baseline (see text :P).

jensroes commented 1 year ago

I found the answer!

The answer is we shouldn't look at the estimates for dependency type in the model coefficients at all. I used orthogonal polynomials which centre and scale the time variable. For example, for a quadratic function we get these two polynomials:

> poly(1:10, 2)
                1           2
 [1,] -0.49543369  0.52223297
 [2,] -0.38533732  0.17407766
 [3,] -0.27524094 -0.08703883
 [4,] -0.16514456 -0.26111648
 [5,] -0.05504819 -0.34815531
 [6,]  0.05504819 -0.34815531
 [7,]  0.16514456 -0.26111648
 [8,]  0.27524094 -0.08703883
 [9,]  0.38533732  0.17407766
[10,]  0.49543369  0.52223297

This means the effects for dependency type in Table 2.3 are not the difference after accounting for time, but the dependency effects somewhere in the middle of the experiment.

So what we need to do is to not report Table 2.3 at all because it doesn't answer anything (we know from Table 2.2 that there is no time by dependency type interaction) and I will replace the coefficients Table 2.5 with a table that only shows coefficients that do not involve time but main effects and interactions after accounting for time. If we need the by-time interactions, we can still add them but I think we get what we need to know about them from the model comparisons.

jensroes commented 1 year ago

I updated the results to streamline it a bit more: https://rpubs.com/jensroes/979417