Detailed word list now available

kparrish92 commented 1 month ago

Hi @juanjgarridop!

I've just made the detailed word list available. It has the 96 real words (we have at present 48 pseudowords, which piloted well, but can increase them too of course). Check those out here: https://github.com/kparrish92/l3_cognate_study_pilot/blob/main/data/stimuli/pw_list.csv

Here is a link to the the real words:

https://github.com/kparrish92/l3_cognate_study_pilot/blob/main/data/stimuli/stim_detailed.csv

To do: Find replacement 3-way cognate for item 24 (or just have one fewer item).

juanjgarridop commented 1 month ago

Hi @kparrish92

Thank you for putting this together. I looked at the list, and it looks good. When I first glanced at the list a couple of days ago, I didn't notice that the pseudowords were on it too. I think that is why some words looked unfamiliar to me lol my bad!

I was wondering if words like "chat, test, golf" in Spanish should be excluded because they are loan words from English, but the plots of the random effects do not show a big difference between these words and the others, so I think they should be fine to keep.

I did notice the taboo word you mentioned. We should probably replace it. Is this an easy fix? I don't know how many identical cognates are in the Phor in one dataframe.

The words have a wide range of frequency values, so we should definitely test for any frequency effects across word groups and within the cognate group itself.

In my BLC publication, reviewers asked me to include a one-to-one ratio between real words and pseudowords. I think I included 100 pseudowords, 50 cognates, and 50 noncognates in each task. Different studies have used different ratios. In Carrasco-Ortiz et al. (2019), only 16.7% of trials were nonwords. In Frances at al. (2021), they used a one-to-one ratio. In our task, one third of trials are pseudowords. I personally do not have a problem with this, but reviewers might. We can keep it this way and fight back if needed OR we can have a one-to-one ratio and nobody will bother us about this. Where did you get the pseudowords from?

I am very excited about this project! Let me know what is next!

kparrish92 commented 1 month ago

Thanks for looking at this!

Taboo word

I looked at this and it was easy to replace! The word "moral" is the same in all three languages, has almost identical lexical frequency and is the same length. I made the adjustment to the list and verified that we could not find any differences in the lists on the basis of frequency, as was the case before removing the pseudoword.

Pseudowords

I used a newer tool called UniPseudo (New et al., 2024), which is a freely available pseudoword generator available online (http://www.lexique.org/shiny/unipseudo/). It uses either a bigram or trigram algorithm to generate pseudowords of a particular length given a word list and a language.

I agree it's a good idea to go ahead and make the words to non-words a one-to-one ratio, so I've done so! The same links should have the new lists. The new counts are:

192 words total

96 pseudowords
96 nonwords (48 non-cognates, 24 three-way cognates, 24 two-way cognates).

If you think this is good to go now, we should be good to start data collection for the experimental task. We could also decide whether we want to give participants the Lextale and a background questionnaire also. I think that the Lextale could be skipped, and we can treat the number of correct answers as a proxy for proficiency (which I did with the pilot data and there is an effect). A background questionnaire would be needed, though, in my opinion. I've used the Bilingual Language Profile before: https://sites.la.utexas.edu/bilingual/files/2012/01/BLP-ENGLISH-SPANISH.pdf. How do you feel about this? The goal from the L3 end is to determine their age of onset, order of acquisition, and whether they speak additional languages. We could alternatively create our own questionnaire.

Thanks again! I am also really excited for this project.

juanjgarridop commented 1 month ago

Thanks for fixing the taboo word!

Thanks for increasing the number of pseudowords too. I did not know about that tool. It is great! I will use it from now on.

Regarding proficiency, the number of correct answers can be a good proxy since this is a lexical decision task using words with different levels of frequency. Do you know of any studies that have used correct answers as a proxy of proficiency? I have not seen any before, but if you know of some we can cite, that'd be great. The LexTALE is always great to me, and it is quick, so if it is possible, I always like to include it. If we have previous studies that have measured proficiency using correct answers, we can skip it, but if not, we might want to include it just to have a reliable measure of proficiency. Now, if we were to include it, would we include only LexTALE in Spanish, or in the two non-native languages?

Regarding background questionnaire, I am ok with using the BLP. I have used it before.

Thanks for moving this forward. I have been disconnected from work over the past few weeks. Trying to get back to everything now.

kparrish92 / l3_cognate_study_pilot

Detailed word list now available #2