nalmadi / fix8

Fix8 (Fixate) is an Open-Source GUI Tool for Working with Eye Tracking Data in Reading Tasks.
Mozilla Public License 2.0
6 stars 0 forks source link

Better synthetic data #158

Closed nalmadi closed 3 months ago

nalmadi commented 3 months ago

the method for generating data about word skipping is based on a probability determined by the user to randomly skip words at a certain rate, if I understand it correctly. Given that word skipping rates are primarily influenced by word length, I wonder if it would be more beneficial to consider word length in generating word skipping data. If word skipping is based entirely on a random rate applied equally to all words, it is likely that the skipping patterns will not be realistic (e.g., more skips on longer words, fewer skips on short words than typical). Brysbaert & Vitu (1998) have some helpful data on typical skipping probabilities across different word lengths.

The same thing is true for generating regression probabilities.

nalmadi commented 3 months ago

Implemented a better skipping synthetic data generator that replicates data from Brysbaert & Vitu (1998), the resulting skip probability looks like (but it can be controlled by the user): image

I have no idea how to do the same for regressions, I can't find a paper with a distribution or a figure that I can use.