Open tatianamac opened 4 years ago
I think I can help a little with this, from my experiences facilitating the translations for React and Gatsby :)
First off—how you want to support different languages depends on how you see the scope of selfdefined. Is it meant to primarily focus on definitions of English words, other languages being pure translations of the English articles? Or would it work more like Wikipedia (except y'know, less run by white cis men) where the content in each language is mostly independent but they can cross-reference each other?
Given selfdefined's goal "to reflect the diverse perspectives of the modern world", I feel like the latter approach is better. There's no such thing as a neutral translation, and even words with near identical definitions can have different connotations of ableism based on etymology or usage. I think it's better to trust native and fluent speakers to understand the contexts of their own languages.
But this is also more work: you mentioned reviewers for languages you can't speak. How do you choose these reviewers? Does each language need a dedicated reviewer? Given the domain of selfdefined, presumably one would hope that reviewers have an understanding of sexism, ableism, white supremacy, and other forms of the kyriarchy. But at the same time, we as English speakers need to watch out for our own biases in the way that we structure these terms and be open to them being structured/defined another way by other languages and cultures.
One thing to consider is to keep this instance in English but make the site a template that's easy to set up, so that speakers of other languages can set up their own instance and arrange the site as they see fit. That way, each language can be owned and maintained by people from that language's community based on interest, without the overhead of having to manage everything.
There are a lot of different ways that you can do this, based on the number of (awesome) directions that this project can go. Everything has it's tradeoffs, and I hope I was able to elaborate on some of them and none of this was too obvious!
As for the technical considerations: again, a lot of it depends on what structure you're going to end up using (everything translated from English or each language able to have individual entries). I peeked at the 11ty docs and it doesn't seem to have anything specific to localization just yet, but I can look a little more into it :)
@tesseralis Thanks for this input. Lots to think about in there.
Personally (also speaking as non-native English speaker) I think it makes sense to allow words that not defined in English yet. As Nat pointed out, definitions for the same word might wary and there will be words that only make sense in one language but not the other.
re technical considerations, a rough idea jumping two steps ahead: I maintain a multilanguage 11ty site (code is unfortunately not open source yet, but might be soon), in which I use folders named after the locales (e.g. fr
, nl
, en
). I use a directory data file in these folders to set a locale, e.g. en.json
looks like this:
{
"locale": "en"
}
We could use this approach to access specific collections, I guess. Setting the locale this way also allows to add the lang
attribute to the HTML element (among other things).
<html lang="{{ locale }}">
Two steps back: Regarding the folder structure two solutions come to mind:
en.md
, nl.md
and so forth in them.Either way has its merits, either way seems solvable with 11ty.
@tesseralis Thank you for this thoughtful answer! I opened a separate "technical only" issue (#164 ) so that someone can start the work. I did my best there to address the points you brought up. I'd welcome any additional thoughts you have there on how to structure.
As for all the notions, your assumptions were correct that I want to:
@ovlb I agree with the notion that there are folders for each language, but not each word. As each word isn't 1:1, I'd prefer to find another way to interlink words, as I think that the word slugs should be in their language (for example, ableism in French is capacitisme. If we tried to maintain one overarching word folder, it would undoubtedly inherit the English bias there. This way also allows for each word to be independent of English.
I like and support all the other locale ideas, in particular for the html encoding.
Once you both provide (if you would like) feedback to the other task (#164 ), and we are able to create language instance template, I think we can consider this task closed? Why did I open such a vague issue? 😂😭
@tatianamac I have a lot of background in linguistics, information organization and retrieval, and machine translation (and the bias that comes along with all those things) and would love to help out on this and related issues.
@lorarjohns Would love your help! You're welcome to our Slack community if you'd like! Otherwise feel free to add any thoughts/insights here for this task!
Hi, I came here looking to help with bots, but it makes sense to clarify questions like this — plus get an API spec'd and underway — before putting real focus into bots.
Some relevant background: my partner is a fiction author & activist who grew up in Malaysia, lived in the US for > a decade, and we've lived in central France since 2006. I grew up in NY; I've worked for a UK company for the past decade; now I'm in a Dutch one (both English-speaking, but not really the same English).
We talked through some of the possibilities. IANAL(inguist), and these are primarily our personal observations, not based on any research or stats.
A few possible flows & contexts
Action: Post a link to an entry for a Malay-language word, explaining its baggage & problems, written in English (...en_MY? or default to as-generic-as-possible "en"?). Alt: Link to an entry for the same Malay-language word, explained in Malay (or other language), for people in the discussion whose English isn't as fluent.
Action: Link to an entry for "laïcité" (fr_FR), explained in en_US.
Action: Provide a link explaining to someone who speaks en_(!UK) how "pants" is used in the UK, and potential problems when using it. Alt: entries for "en" could include differences between locales in the discussion by default, to help people who are interacting in English across locales.
If you're in Memphis, you won't want to see e.g. a Tamil word that's used in en_MY. Words like "indigenous" or "native" have different histories and baggage in the US, Malaysia, Singapore, South Africa, UK, etc.. If you're in Sri Lanka, you will probably not need a long explanation of very US-specific baggage. A word like "Tamil" has different history/context in India, Sri Lanka, Malaysia, or the US, for sure.
Alt: people who are working remotely for UK companies, or who immigrated to the UK, browsing words that can be problematic specifically in UK English. (Are there phrases that my English colleagues are using in private chat that I should avoid? ...possibly tell them not to use with me? Probably)
Though: there could be a case for linking word definitions that are "en" but in "en_UK" vs. "en_SG" etc. (if they're separate entries) — e.g. I'm going to be speaking English to a large, diverse group; I want to know if the major terms I'm using may be misinterpreted in dangerous ways.
(This one still feels blurry; it'd be useful to find examples of English words that have many meanings/contexts around the world, where it might be a problem to have one entry that explains them all.)
If this is useful, let me know! I'm happy to write more about how these cases might guide data and/or API design, but I'm not sure where that planning is now.
Challenge
As we've started to translate definitions into languages other than English, I'd like to make sure we're considering how to to show them both from a data management and design perspective.
Technical Considerations
Human Considerations
Next Steps