lookit / lookit-api

Codebase for Lookit v2 and Experimenter v2. Includes an API. Docs: http://lookit.readthedocs.io/
https://lookit.mit.edu/
MIT License
10 stars 18 forks source link

Revise language options for child registration #491

Open kimberscott opened 4 years ago

kimberscott commented 4 years ago

TL;DR: Solidify a list of language options on child registration to reflect languages participants generally use.

This issue covers just coming up with the list, NOT implementing the changes in the Child model & eligibility determination to accommodate it.

Narrative As a participant, I would like to be able to report the languages my child is learning to speak, but they are not necessarily listed clearly on the sign-up form.

As a researcher, I would like to be able to set inclusion criteria based on the number of languages a child speaks, even if one or more of the languages is relatively rare.

Acceptance Criteria

kimberscott commented 4 years ago

Transferring this task over from a Slack conversation to make it a bit easier to track. Partial conversation history:

Kiley: Hi everyone, we noticed that Cantonese is not on the list of languages in the demographic form. A large percentage of the West Coast East Asian population speaks Cantonese. Is it possible to add this language? We also noticed a lack of Indian dialects that our parents regularly report speaking - I can send our language form if you guys are up for adding more?

@kimberscott : That sounds like a great idea to clarify the language form. When we set it up I think @rico used the first 2^N most commonly spoken languages (either as L1 or total) but the names don’t necessary map on to common usage - e.g. Cantonese is apparently a variety of Yue, but I’m not sure whether e.g. American Cantonese speakers generally know that.

@kimberscott: Any chance you’d be up for comparing the language lists and recommending (a) any changes to the labels - e.g., “Yue -> Cantonese (or other Yue)” or just “Yue -> Cantonese” (b) any languages that are genuinely not on there even at the wrong “level” ? Otherwise if you can make an issue on GitHub about this that’d be great - this is definitely worth clarifying but (correct me if I’m wrong @rico) not entirely trivial.

Francis: Hello! Just a quick update to this thread: this turns out to be a huge task so it's taking longer than I anticipated because I have to cross check many languages!

It occurred to me that a more elegant solution to this problem might to be add an 'other' option for the language exposure for parents to enter language manually? Using the example of parents not knowing Cantonese is a variety of Yue, it would make more sense to give parents the option to manually enter Cantonese then have a researcher categorize at the back end rather than have duplicate languages? Also, having an exhaustive list of languages will make the page incredibly messy. What do you think @Kim? Is this a viable feature request?

@kimberscott : It’s actually very much worth the initial investment of time to have checkboxes that reflect the expected set of responses, rather than relying on ongoing manual re-categorization. (To get into the weeds a bit - if we add “other” with the intention of that covering common languages like Cantonese, in addition to the additional field which won’t automatically work with eligibility criteria and eventually translations, we’d also need to set up an interface for researchers (which?) to edit child data and an arrangement with someone to keep on top of that.) I think an “other” option might make sense in addition, but not in place of improving the options.

This perspective brought to you in part by the free-response ‘race’ field on the Lookit prototype, which allowed me the fun of hand-categorizing free responses :) (We got a lot of responses like “American” and “Muslim” that were impossible to categorize, plus the expected 20 ways to spell caucasian.)

I do think we want to avoid having duplicate languages, though, so rather than adding Cantonese (for example) I’d suggest replacing the current Yue option (if you think that’s appropriate) with something clearer.

Francis: Thank you for the feedback! Right, I hadn't thought about the automatic categorization and eligibility part. I think one thing I should check with you (and other developers of Lookit!) is whether the intent of the child profile was to include languages that are less common, or just the most common languages for eligibility purposes. I am hesitant to propose any additions to the language list based on our Canadian census (seems slightly Canada-centric to do so, haha!) since we may be introducing many languages that are spoken by a very small population (e.g. Indigenous languages), thus expanding the list to an overwhelming length. On the other hand, I also notice that on the existing list there are quite a few uncommon languages already, so maybe it would be worthwhile to propose additions of other languages? I don't want to step on anyone's toes here so please let me know what you and the Lookit team would prefer!

@kimberscott: That’s a great question about the purpose of the child profile language question. It’s essentially data storage we added ahead of the actual use (e.g. we don’t actually have anyone filtering on kids speaking particular language combinations yet) and so the goals aren’t as well-defined as they might ideally be. There are several potential purposes:

Given that Lookit studies are currently in English only, I’m on board with adding more explicit options for languages that are commonly spoken in conjunction with English - e.g., in Canada. Upon writing those out, it does seem it’d probably be worth also having an “other” option even if we don’t do anything with it right away.

We should also add the first N most used sign languages.

But I would like to do it in a way that doesn’t make the language section too overwhelming. Options include eliminating some of the least commonly spoken languages on that list and focus more on languages that we see more often, or doing some of this in the UI (e.g., show the most commonly spoken languages along with a “show more” button). We got the initial list from here (and used the first 64) if that’s useful for comparing proposed languages in terms of total number of speakers.

I think there are becoming several distinct pieces here - (a) clarifying languages that are present but listed under maybe-less-used names, (b) reviewing the list and adding options that are common enough in your participant population to include - including sign languages, (c) adding an other option and making sure the UI isn’t now overwhelming.