Appendix: model descriptions

jensroes commented 3 days ago

@RConijn can I ask you to do another thing?

I think the model description in the appendix (Appendix B) could possibly be clearer. I've made some changes already since initial submission. Could you have a look?

Attached are the rmd file and the latest pdf version.

models.zip manuscript.pdf

RConijn commented 2 days ago

I've updated the text for clarity & fixed typos: models.zip

Some things I wasn't sure of / to check:

In other words, we used statistical models to map between keystroke data and the theoretically assumed cognitive process that underlies the generation of interkey interval data -->(changed to) --> In other words, we used statistical models to map between keystroke data (in particular interkey intervals) and the theoretically assumed cognitive process that underlies hesitations in written production.
I'm a bit confused by the notation in equation B3: $\sigma_{e_\text{location[i]}}^2$. Does it make sense to change this into something like: $\sigma_{e,i}^2$ where $\sigma_{e,i}^2 = \beta_\text{2,location[i]}$ - similar as in how $\mu_i$ is represented? Not sure what the standard notation is for these kinds of things, but having double subscripts confuses me.
instead of assuming that different linguistic edges shift the distribution over average interkey intervals towards larger values --> larger linguistic edges?
This $\delta$ parameter was constrained to be positive and added to the distribution of fluent key transitions $\beta$. --> So here it does seem additive, which is what we wanted to avoid right (re. Mark's egg-restaurant example?)
"Note that the $\beta$ parameter is represented in both log-Gaussian distributions in equation \ref{eq:bimodcon}." --> Full sentence should probably be moved to point (2) or paragraph below ~ and relates to my previous comment.

jensroes commented 2 days ago

Thanks. So frustrating that there were still typos you had to fix :)

In other words, we used statistical models to map between keystroke data and the theoretically assumed cognitive process that underlies the generation of interkey interval data -->(changed to) --> In other words, we used statistical models to map between keystroke data (in particular interkey intervals) and the theoretically assumed cognitive process that underlies hesitations in written production.

Hm, I will think about this. It's not just about hesitations but all key intervals I think.

I'm a bit confused by the notation in equation B3: σ e location[i] 2 . Does it make sense to change this into something like: σ e , i 2 where σ e , i 2 = β 2,location[i] - similar as in how μ i is represented? Not sure what the standard notation is for these kinds of things, but having double subscripts confuses me.

I think that would be more confusing but you are right that location shouldn't be sub-subscripted. Ill fix that.

This δ parameter was constrained to be positive and added to the distribution of fluent key transitions β . --> So here it does seem additive, which is what we wanted to avoid right (re. Mark's egg-restaurant example?)

I think I need to add this to the response letter. There is a difference between the processes we assume and how we parametrise the model. The addition of delta just fixed the label switching issue in mixture models, it doesn't assume that the planning processes are additive cause planning always happens, not just at hesitations.

RConijn commented 2 days ago

I think I need to add this to the response letter. There is a difference between the processes we assume and how we parametrise the model. The addition of delta just fixed the label switching issue in mixture models, it doesn't assume that the planning processes are additive cause planning always happens, not just at hesitations.

Hmm. I agree that response letter would be good, but we need to make sure that other readers understand it as well (not sure how though, as I'm getting more confused here too:P). Maybe have a look at this sentence too "In other words, β represents the average typing speed for fluent transitions between keys". Given that β is the same in both fluent and hesitation distributions, it still seems additive to me in this way. Let me phrase it differently: why does β need to be the same in both?

jensroes commented 2 days ago

I'll clarify this in methods section and appendix. It's kinda simple: there is some average typing speed that varies across writers, right? To work out what a hesitation is, you need to know how fast some can type. Like, 300 ms is kinda hesitant when you usually type 150 ms but it's not hesitant when 300 ms is how fast your fingers usually move. beta just gives a baseline. adding delta allows us to determine how big the slowdown is when hesitations occur.

however, delta does not mean processes are additive because planning happens across all intervals, slow and hesitant. A hesitation just means that some information wasn't available in time because it was more difficult to retrieve.

RConijn commented 2 days ago

Yes, certainly makes sense. I think this is the first step why people start to think of it as additive: you have some hesitation on top of just the basic typing speed. But then, there are 2 distributions and if you add that for fluent intervals the planning fitted within the average typing speed I think this perfectly explains it. -- anyway, hope this helps in how to clarify it.

jensroes commented 2 days ago

Yes. The problem is that people tend to think about it from a serial perspective, meaning that pauses and their duration reflect something about cognitive levels of processing. Even if they don't believe that the serial view is a thing, this is exactly how they think about data and how they model it (as in their models assume a serial view).

Mark-Torrance commented 2 days ago

adding delta allows us to determine how big the slowdown is when hesitations occur.

"Slowdown" is confusing here. I think, and we shouldn't use it. The right way to think about the model is just in terms of β1 and β2 where β2 is the central tendency for the second distribution. β2 is by definition greater (longer) than β1. Because the second distribution is of IKIs where processing could not be completed in the time taken for previous execution. so δ must for theoretical reasons be positive. This is captured in how the model is constrained.

I'm definitely not suggesting that we try to redefine the models using β2 instead of δ - even if that's possible. We just need to be really clear that δ just captures difference between the central tendencies of the two distributions. It has nothing to do with the extra time that it takes to plan content on top of time for keystroke. Keystroke execution time does not contribute to IKIs in the second distribution (and therefore doesn't contribute to β2).

Communicating this might be tricky. But communicating it we must :)

jensroes commented 2 days ago

I'll go over this again for the model descriptions. I do believe that that's it what we're communicating but there is a tendency coming from serial thinking plus peoples experience from using linear models to understand additiona of parameters as additive processes and mixture models as ways to distinguish different processes involved in writing which of course only makes sense in the serial view. Btw @Mark-Torrance this is also what one of the NotebookLM podcasts thought. I don't think this is something we suggest anywhere but this is what readers seem to be looking for, so we need to make this clear for the model descriptions as well.

Mark-Torrance commented 2 days ago

I think we want to avoid over-explaining as well, though. I know that this is difficult. But we don't want reviewers to now think we are making a radical new claim about parallel processing. To be honest I'm not sure how all three of them managed to give us positive reviews without understanding this key part of the paper. But given that they have we need to manage their egos a bit :)

jensroes commented 2 days ago

How about this (for the Appendix with the model descriptions):

The distribution of hesitant transitions is relative to the distribution of fluent key transitions. This was achieved by adding the $\delta$ parameter, which was constrained to be positive, to the distribution of fluent key transitions $\beta$. Note that the $\beta$ parameter is represented in both distributions in equation \ref{eq:bimodcon}. In other words, this parameterisation captures the fact that hesitations must be understood as relative to (and by definition larger than) the duration of fluent interkey intervals [@wengelin2001disfluencies]. Importantly the addition of $\delta$ to $\beta$ in the distribution of hesitant transitions does not mean that cognitive processes are assumed to additive (as explained in Appendix \ref{interkey-interval-predictions-under-a-parallel-theory-of-text-production}) because, fundamentally, planning related processes run in parallel to output.^[Also, on a more practical level, this parameterisation of the model helps to address potential divergence problems related to label switch, a notorious problem for mixture models [see e.g. @stephens2000dealing].]

These two distributions are associated with the mixing weight $\theta$ which must be larger than 0 and smaller than 1. $\theta$ is parameterised to represent the weighting of the distribution in the first line, hence representing the hesitation probability. This probability is inversely related to the mixing weight of the distribution of short interkey intervals by $1-\theta$ as the weights of both distributions must sum to 1.

In line with the literature discussed in the introduction, we allow the hesitation probability $\theta$ and the distribution of hesitant transitions to vary by transition location. We introduced the subscript location[i] to state that hesitation probability and hesitation duration vary across transition location associated with the $i^\text{th}$ interkey interval. As fluent typing speed and hesitation frequency is subject to individual differences and writing style (and skills), we also assumed, respectively, that some participants are faster or slower than others and that some participants are more and others are less likely to hesitate [@waes2019].

RConijn commented 2 days ago

Two small changes:

In other words, this parameterisation captures the fact that hesitations must be understood as relative to the duration of fast or slow interkey intervals of fluent interkey intervals [@wengelin2001disfluencies].

--> In other words, this parameterisation captures the fact that hesitations must be understood as relative to (and by definition larger than) the duration of fast or slow interkey intervals of fluent interkey intervals [@wengelin2001disfluencies].

we assume that the hesitation probability θ and the distribution of hesitant transitions was allowed to vary by transition location.

--> we allow the hesitation probability θ and the distribution of hesitant transitions to vary by transition location.

jensroes / prowrite-mixture-models

Appendix: model descriptions #17