Open aridyckovsky opened 3 years ago
@psokolhessner I've added the ideas from above for regressions, the first of which found here: https://github.com/sokolhessnerlab/itrackvalr/blob/main/notebooks/behavioral_data_preprocessing.md#predict-is_hit-using-signal_time
The models as written output fit warnings, one of which is common to both: fit warnings: Some predictor variables are on very different scales: consider rescaling
The glmer
model also outputs this: optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 3 lme4 warnings
Ah that warning (some predictor variables are on very different scales
) would be b/c signal time will have values in the thousands, as compared to the intercept (a value of 1). Though such pedestrian numeric scale differences shouldn't matter, they do. We'll need to rescale signal_time for both regressions. I'd consider rescaling by 3,600 turning them from units of ms into units of hours (and fractions thereof). Then everything lives on a similar scale.
The glmer
output is slightly more opaque. What are the additional 3 lme4 warnings
?
When running models, you want to store their output too. So the calls to lmer
and glmer
should be something like
model1 = lmer(...
What to name the models... we may be working with them quite a bit, so keeping names clear but also not too long would be good. Here, I'd consider a name that features some text that indicates this is regression output, e.g. model
or fit
, along with descriptives that capture regression features and/or sequence and variants. I've used names like model1
, model2a
, model2b
, etc before, as well as model_RT_SignalTime_MFX
(latter captures that it's model output; it's on RT, using signal time, and is mixed effects [fixed and random]).
Left suggestions for how to do this in the RMD file with this commit: https://github.com/sokolhessnerlab/itrackvalr/commit/e2efb35a0d0eb4e36e3346365fec96df3e1d364c
@psokolhessner thanks for all of this. I updated the renv.lock
to include lmerTest
, so that will be accessible throughout the repo. I also adjusted the signal and reaction time scales to the [0,1] interval, and the models ran without warning. Pasting the summary responses here:
Predicting is_hit
from signal_time
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
## ]
## Family: binomial ( logit )
## Formula: is_hit ~ 1 + signal_time + (1 | id)
## Data: scaled_combined_hits_df
##
## AIC BIC logLik deviance df.resid
## 2307.5 2323.9 -1150.7 2301.5 1797
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.494411658 -0.784968172 -0.477289551 0.902517615 2.366707656
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.698673966 0.835867194
## Number of obs: 1800, groups: id, 50
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.230697540 0.157237994 1.46719 0.14233
## signal_time -0.859634004 0.182285149 -4.71588 2.4067e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## signal_time -0.572
Predicting reaction_time
from signal_time
## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: reaction_time ~ 1 + signal_time + (1 | id)
## Data: scaled_combined_hits_df %>% na.omit()
##
## REML criterion at convergence: 2463
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.856746002 -0.522942508 -0.203867491 0.199189511 5.955977930
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.253031509 0.503022374
## Residual 1.067215435 1.033061196
## Number of obs: 821, groups: id, 50
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.424493424 0.100223741 105.550683060 14.21313 < 2.22e-16 ***
## signal_time 0.774275825 0.128541525 793.026666766 6.02355 2.6097e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## signal_time -0.585
Models to run:
Predict the probability of a hit by signal time with signal time random effects
is_hit ~ 1 + signal_time + (1 + signal_time | id)
Predict the reaction time for a hit by signal time with signal time random effects
reaction_time ~ 1 + signal_time + (1 + signal_time | id)
Predict the probability of a false alarm by response time
is_false_alarm ~ 1 + resp_time + (1 | id)
Predict the probability of a false alarm by response time with response time random effects
is_false_alarm ~ 1 + resp_time + (1 + resp_time | id)
Note:
is_false_alarm
is 0 when is_hit == 1
, or 1 otherwise
Checklist:
is_false_alarm
column based on is_hit
and other resp_time
without a signalpredict
ranef()
, fixef()
and coef()
A tweak to the definition of is_false_alarm
to more clearly define what goes into "otherwise":
is_false_alarm
contains entries for all responses/button-presses, and is 0 when is_hit
== 1, or 1 otherwise.
Alternatively, variable can be identified as is_falsealarm_vs_hit
(much like is_hit
can be more clearly defined as is_hit_vs_miss
).
I think it's clearer to identify boolean variables as is_hit
and is_false_alarm
, keeping them as completely logical values. By introducing a "vs" into the labeling, we then must rely on our interpretations to understand the underlying logical value's meaning.
For example, the strictly boolean variable is_hit
definition is very easy to understand from a data-reading perspective: If is_hit
is 1 (true), then it's a hit. If is_hit
is 0 (false), then it's not a hit. However, is_hit_vs_miss
is not straightforward at the boolean level -- it requires annotation separate from the data to interpret correctly, i.e., is it a hit when true but a miss when false, or a hit when false but a miss when true? In this case, we are better served by adding an is_miss
variable to maintain clear boolean variables with no room for interpretation error.
Per conversation, will transition to use is_hit_given_signal
and is_hit_given_response
etc.
Plus: is_false_alarm_given_response
The models we've discussed are now part of the main pipeline via the analyze_behavioral_data
sub-pipeline. The output of the analysis notebook can be found here
Fantastic. Really nice, clear evidence - with increasing time in the task, people...
(_note the careful phrasing of no. 3; if we wanted to say "are more likely to false alarm when no signal is present", that would require a fourth regression or pair of regressions on is_false_alarm_given_nosignal
using trial number or step time instead of resp_time
or signal_time
_)
Interesting to note how robust all of these effects are - fully RFX models identify the exact same effects, implying that most participants experience the effects of the passage of time on hits, reaction time, and false alarms in the same or very similar ways. Visualization or characterization of the individual-level estimates (either given by coef()
or the sum of fixef
and ranef
) would also likely establish that. Of course, if they experienced these the same way, then the RFX estimates would be 0, and that's not the case, so there is some variability, just not much, and it's dwarfed by the FFX term (group-level overall shared effect) in magnitude.
The remaining items from the checklist above (https://github.com/sokolhessnerlab/itrackvalr/issues/25#issuecomment-843581152 - mainly plotting model output, and mean p(hit) by half) will nicely wrap this up. Thank you @aridyckovsky this is looking great!!
TODO:
is_hit
logical column (0, 1)Potential regression ideas: