uchicago-computation-workshop / ma_proposal_workshop_a1

0 stars 1 forks source link

Extension: Perea, Acha & Carreiras (2009)- Eye movements when reading text messaging (txt msgng) #8

Open policyglot opened 5 years ago

policyglot commented 5 years ago

I will be building on psycholinguistic analysis from the Quarterly Journal of Experimental Psychology: https://www.ncbi.nlm.nih.gov/pubmed/19370488

policyglot commented 5 years ago

Perea, Acha and Carreiras (2009) are psycholinguists interested in the effects of ‘deviant’ orthography in SMS on the reading time for such messages, with respect to messages that follow standard rules of spelling and grammar. Their research question is whether the inclusion of orthographic and phonological variations of spellings in Spanish exert heavier reading costs than standard Spanish for young adults who are experts in using SMS language. They make the assumption that ‘linguistic characteristics of words have an impact not only on the duration of fixations but also on which words are fixated’ and hence select eye-tracking studies as their method of choice, and emphasize the associated differences in local and global measures.

They begin their paper with a detailed introduction on the emergence of SMS, and then summarize a vast number of studies on deviations from standard orthography in more general settings. This early section does not theorize extensively, and the link between their ideas and the actual experimental data is quite unmediated relative to other psycholinguistic research. To differentiate between the two key sources of mis-spelling, they create sentences where the deviations are based on orthography and phonology respectively. Each one receives a ‘standard’ control, leading to a total of 4 conditions. The ‘corpus’ for the SMS lies online at www.diccionariosms.com. This was used to generate a sequence of 72 experimental sentences, which were then administered to 26 university students on computer screens. The authors aimed to control as much as possible for word frequency effects.

The type of abbreviation used had a highly significant effect on global and local measures, including(Fs . 6.27, ps , .001) total sentence reading time, the total number of fixations across the sentence, and the total number of backward fixations. Critically, the reading cost proved to be higher in the phonological (rather than the orthographic) script. They conceded through post-hoc analysis that the number of letters elided in the spelling should have been controlled for.

I found this research compelling in its simplicity as well as its relevance. SMS communications are growing daily, and offer a digital window into previously intractable areas of social science research. However, my concern lies with the focus of several studies on European languages, ignoring the vast new emergence of users globally. Africa is particularly notable here since many of its languages are written in the Roman script. This allows for application of many of the same tools and algorithms. Unlike European ones, the family of Bantu languages is highly agglutinative. A number of grammatical features thus become integrated into the verb as a single word. In such contexts, deviations in spelling can change not only the reference (as with nouns) but also the overall meaning of the sentence.

I would therefore propose an extension of Perea, Acha and Carreiras' research to the Nyanja/ Chichewa language, which is a member of the Bantu family and spoken widely in Malawi, Zambia and Mozambique. Previous work by Munro and Manning (2010) provides examples drawn from SMS data in Malawi. Here, I would add group phonological and orthography changes together. A new second category would include grammatical changes due to differences in spelling. So now we would have a 2*2 experimental design, with one fully correct sentence, one with incorrect spellings for a noun, one with incorrect morphology for a verb, and one with both wrong. Then the same eye tracking mechanisms as in the original paper can examine whether a deviation in grammatical or syntactic meaning exerts a higher reading cost than deviations in semantic meaning. This can pave the way for similar analyses of SMS reading costs, especially for languages with less standardized orthography than English and Spanish.

Robert Munro and Christopher D. Manning. 2010. Subword variation in text message classification. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 510-518.