plk / biblatex

biblatex is a sophisticated bibliography system for LaTeX users. It has considerably more features than traditional bibtex and supports UTF-8
505 stars 114 forks source link

Serbian localization: how to handle variants and settings? #937

Open randrej opened 4 years ago

randrej commented 4 years ago

I want to contribute a localization for the Serbian language, but first I need to ask about some things.

Serbian language uses (equally) both Latin and Cyrillic scripts. This is handled by Babel as two different 'languages': serbian is the Latin variant and serbianc is the Cyrillic variant. If I provide (locally) two lbx files, serbian.lbx and serbianc.lbx, this works just fine. Is this the proper way to do it? The Cyrillic variant differs from the Latin one only in localization strings, should I inherit it and change the strings?

There is another variable in Serbian language: the dialect or "pronunciation". The two most used dialects are Ekavian (used in most of Serbia) and Ijekavian (used in Montenegro, Bosnia and Herzegovina, and parts of Serbia). They differ in spelling (and pronunciation) for some words containing "e/je/ije", such as:

English: visited, Germany, report
Serbian Ekavian: posećen, Nemačka, izveštaj
Serbian Ijekavian: posjećen, Njemačka, izvještaj

This is handled in babel-serbian by the ijekav language attribute, so you may include babel as such:

\usepackage[english, serbian.ijekav]{babel}
% or
\usepackage[english, serbian]{babel}
\languageattribute{serbian}{ijekav}

How can I provide such a setting for biblatex? Is there a default way to provide such settings?

There are also a couple variations of the date format I'd like to provide since they are used interchangeably and may both used in bibliographies.

I wrote the datetime2-serbian package with the help of Python, Jinja2 and my Serbian Cyrillic ↔ Latin transliteration tool (srtools), and I'll do the same for biblatex: write the Latin localization strings (in a YAML file) and then generate the Cyrillic strings from those. For datetime2-serbian I also generated the LICR/ASCII files from the UTF-8 encoded files using another utility I wrote (utf8_to_licr). Is there a way to provide both UTF-8 and ASCII strings in biblatex?

moewew commented 4 years ago

biblatex currently simply uses babel identifiers. So if Latin-script Serbian is serbian and Cyrillic is serbianc in babel-speak, then serbian.lbx and serbianc.lbx are the way to go.

If the language extras between the two are the same, you might let serbianc inherit the extras from serbian if you like, but it would also be possible to simply copy and paste the necessary code.

Currently, there is no system to query things like additional language variant options such as ijekav. If babel exposes the settings we could try and test for it, but we would also have to do something similar for polyglossia. So things could get messy. Not sure what the best course of action here would be.

Is there a way to provide both UTF-8 and ASCII strings in biblatex?

For languages with only the occasional non-ASCII chars our current modus operandi is to ask for .lbx files to use ASCII only and encode non-ASCII chars as macros (e.g. https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/lbx/german.lbx) for maximum interoperability. But this is not viable for languages that use non-ASCII-based scripts, so there we ask for UTF-8 (https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/lbx/greek.lbx).

So I guess for serbian.lbx we'd use ASCII and for serbianc.lbx UTF-8. If you feel strongly about that or if there are technical issues we can reconsider.

moewew commented 4 years ago

Also: Thank you for considering to contribute!

A related discussion with links to a few resources and earlier language projects can be found in https://github.com/plk/biblatex/issues/867.

randrej commented 4 years ago

I'll provide serbian.lbx as ASCII/LICR, serbianc.lbx as UTF-8 and will use Ekavian only.

If you include a way to detect settings like ijekav, I'd be happy to implement localization variant for them.

I just wanted to be clear: is there a precedent of even adding custom settings to biblatex, even without any connection to babel? Something like how datetime2 defines \DTMdefchoicekey and \DTMDefboolkey. Or may I just define some keys like so? I could then use ifbool to differentiate between these variants.

moewew commented 4 years ago

Currently there is neither a system to add additional options to .lbx files nor is there a way we can detect additional babel and polyglossia options. There have been several occasions where a system to add additional options would have been useful (https://github.com/plk/biblatex/issues/899, https://github.com/plk/biblatex/issues/891, https://github.com/plk/biblatex/issues/555, https://github.com/plk/biblatex/pull/552), so we may have to think about it. Unfortunately, I have no idea what the interface would look like and how we would implement this. I guess this is something we should look at, but I can't promise that I'll be able to see to it before Christmas (and of course I don't know if I can come up with something useful at that point).

At the moment the standard way to set up language variations would be to provide an additional .lbx file (say serbian-ijekav.lbx). A user would then switch to that version with

\DeclareLanguageMapping{serbian}{serbian-ijekav}

But this system is only viable if there is a small number (ideally one) of options, otherwise the number of required files explodes.

The language option business is tracked at #938 and #939 so it doesn't get lost in the discussion here. (In my experience language implementation discussions can get quite long.)

randrej commented 4 years ago

I've sent a PR for the basic Serbian localization #940.