At our 2018-02-19 project meeting, we noted that the project was not transliterating Russian into Latin letters (sometimes called romanization) consistently, or according to any professional standard. I suggest that we use the Library of Congress (LOC) system without diacritics, and promised to find a chart that we could use. That chart is located at https://www.loc.gov/catdir/cpso/romanization/russian.pdf. A few notes:
LOC romanization includes diacritic marks, but we don’t want to include those diacritic marks in filenames. LOC romanization without diacritics means just leaving out the diacritics. Everything else is the same. Omitting the diacritics is legitimate LOC, that is, LOC without diacritics is a Real Thing.
The soft sign is romanized as a prime (not an apostrophe) and the hard sign as a double prime (not a quotation mark). We can use the prime (′) and double prime (″) when we’re writing prose, but they should not be used in filenames, and the same is true of apostrophe and quotation mark. I propose that we simply ignore soft and hards signs when creating filenames.
This means, for example, that the filename interv'yu-amerikanskomu-telekanalu-nbc.txt in the raw/putin directory should instead be called interviu-amerikanskomu-telekanalu-nbc.txt. Note that the apostrophe (representing a soft sign) has been dropped, and Russian ю has been romanized as iu, rather than yu.
LOC romanization is used by professional Slavists mostly in the social sciences, while linguists prefer the so-called scholarly romanization, which, among other things, uses diacritics. Knowing how LOC romanization works is a useful skill for those who may use Russian in their professional lives, so the experience will transfer beyond just this project.
At our 2018-02-19 project meeting, we noted that the project was not transliterating Russian into Latin letters (sometimes called romanization) consistently, or according to any professional standard. I suggest that we use the Library of Congress (LOC) system without diacritics, and promised to find a chart that we could use. That chart is located at https://www.loc.gov/catdir/cpso/romanization/russian.pdf. A few notes:
This means, for example, that the filename
interv'yu-amerikanskomu-telekanalu-nbc.txt
in the raw/putin directory should instead be calledinterviu-amerikanskomu-telekanalu-nbc.txt
. Note that the apostrophe (representing a soft sign) has been dropped, and Russian ю has been romanized as iu, rather than yu.LOC romanization is used by professional Slavists mostly in the social sciences, while linguists prefer the so-called scholarly romanization, which, among other things, uses diacritics. Knowing how LOC romanization works is a useful skill for those who may use Russian in their professional lives, so the experience will transfer beyond just this project.