sigmorphon / 2020

SIGMORPHON 2020 Shared Task: Grapheme-to-Phoneme, Unsupervised Induction of Morphology, and Typologically Diverse Morphological Inflection
35 stars 12 forks source link

Russian Bible #1

Open Nofenigma opened 4 years ago

Nofenigma commented 4 years ago

Sorry for a very late comment, but still: it should be noticed that the language of the Russian Bible is very specific, as it is already a rather old translation: https://en.wikipedia.org/wiki/Russian_Synodal_Bible

Yes, it is still valid and quite understandable, but I'm perfectly sure that its language isn't just specific due to its genre, but deviates from the modern language considerably. It's just not very accurate then not to specify this information in the data description, as the results will probably be biased towards a more archaic form of the language.

Of course, this is not the problem for the task per se, but it might be an obstacle, considering the dev and test data you have.

Just a short example of how the language of the translation you chose deviates from the modern one (Ex. 1:21): И так как повивальные бабки боялись Бога , то Он устроял домы их . And because the midwives feared God, he gave them families of their own.

Here, the verb устроял is rather obsolete, archaic, in modern Russian not устроять, but rather устраивать (устраивал for the same form) with a very productive paradigm type. In addition, the form домы is unusual, as the usual plural is дома, and домы will not even be found in the paradigm for дом (house).

Cf. the so-called New Russian Translation by IBS (https://bible.by/nrt/): За то, что повитухи боялись Бога, Он даровал им в награду семьи и детей.

Or another recent good quality translation (though by the representatives of the Church of Seventh-day Adventists, see ): и так как повитухи благоговели перед Богом, Он и им даровал счастье в их собственных детях.

In both cases we have даровал = даровать, a very normal verb.

Newer translations have their problems in translation or interpretation itself, but as far as the language form is considered, they seem to be a better option for modern Russian in any case.