Use standardized language identifiers for lbx files

pauloney commented 10 years ago

Is there are "template" one can use to make the translations to be used in language.lbx? Or should that be done on top of one of the existing files?

I would like to create the files for Romanian, Vietnamese, Chinese and Japanese and I do have people in the office which are capable of making the translations and have experience with Bibliographies, but NONE of them are programmers.

Also: Is there a guide on how to add a new language support ? Even though it is easy to understand what goes on inside \DeclareBibliographyStrings{ }, I would like to know when is preferable to use tex-encoding as supposed to utf8, for example?

Other questions are:

1- Can one add support for a language that is not supported by Babel?

2- When do one use \adddot and when does one use \adddotspace ?

3- Why country support (within language.lbx) is limited to Germany, EU, US, France and GB ?

4- Are you using a framework to do this? In general it is easier to manage them in a single spreadsheet with the translations to each language in each column and a script that reads the column and writes the LBX files! The translators can then easily compare to "nearby" languages and easily make other translations.

Is work by others on this kind of issue welcomed ?

Thanks for the great package! Paulo Ney

pauloney commented 10 years ago

Philip, I have started on the strings now and made a release of a new set with a small changes. The new file is on:

https://drive.google.com/file/d/0B3mOBzjP3W1naklKdU1wdzBYRVk/edit?usp=sharing

This one has a release note of:

Fixed all occurrences of \addot. Languages affected: Portuguese and Dutch.
Fixed abbreviation "et al." and "et seq." in Catalan for uniformity.
Fixed abbreviation "op. cit." and "loc. cit." in Danish for uniformity.
Re-parse some comments to the db. Languages affected: Catalan, Finnish and Swedish.

which is included in the zip.

Could you tell me what is your preferred way to deal with it? If you prefer I make small updates at a time, I could hold-off and go work on something else. If you do not mind a large update of several language files at once, I could continue to work while you check this initial set off files.

If you could provide me with a way for a lbx file to include another file, I could do this via a fork-pull-request which probably would be a lot easier for you to handle.

Paulo Ney

pauloney commented 10 years ago

@plk Can you point me to this test suite you mention two days ago?

There is a test suite in the git repository but it doesn't verify identical PDF output, just whether there were any errors. I'm not sure hoe many of the examples actually test language files though. I will probably find time to look next week.

pauloney commented 10 years ago

The more I dig my teeth into the string translation the more I get convinced that we need a test-sequence for the LBX files. One that will show the output to a translator, so he can see the results of his work.

Take this example - we somehow ended up with some one hundred :'s in the translation to Icelandic, which seems like a high number specially for a language which is fairly close to English. I ran a few examples and either the : is suppressed by other punctuation or it shows up in rather odd places where it does not belong, as remarked to me by an Icelandic speaking mathematician. It sure looks like an attempt to write a style inside the LBX file - which is not supposed to be the case.

The translator (Baldur Kristinsson) @baldur is around and can probably help us clear up some of the issues ...

Paulo Ney

baldur commented 10 years ago

@pauloney I you got the wrong Baldur ;) it's a common name in Iceland (Baldur Kristinsson) @bk looks more correct

pauloney commented 10 years ago

Sorry!

Paulo Ney

On Sun, Oct 27, 2013 at 9:39 PM, Baldur Gudbjornsson < notifications@github.com> wrote:

@pauloney https://github.com/pauloney I you got the wrong Baldur ;) it's a common name in Iceland (Baldur Kristinsson) @bkhttps://github.com/bklooks more correct

— Reply to this email directly or view it on GitHubhttps://github.com/plk/biblatex/issues/160#issuecomment-27182706 .

pauloney commented 10 years ago

I did an auto-completion on his name on GitHub and got the wrong one!

Paulo Ney

On Sun, Oct 27, 2013 at 9:41 PM, Paulo Ney de Souza pauloney@gmail.comwrote:

Sorry!

Paulo Ney

On Sun, Oct 27, 2013 at 9:39 PM, Baldur Gudbjornsson < notifications@github.com> wrote:

@pauloney https://github.com/pauloney I you got the wrong Baldur ;) it's a common name in Iceland (Baldur Kristinsson) @bkhttps://github.com/bklooks more correct

— Reply to this email directly or view it on GitHubhttps://github.com/plk/biblatex/issues/160#issuecomment-27182706 .

plk commented 10 years ago

If you check out the git repo, you could try installing the version you want to test into your normal TeX tree so that it's picked up during the tests and then run "build/build.sh test" in the main git directory. It is probably rather specific to my machine though. It's a bash script. I intend to add a set of reference PDFs to this and use a Perl module to do PDF compare on them for regression testing soon.

bk commented 10 years ago

You are most probably right about the punctuation issues. At the time, I was mostly focused on getting things working with an Icelandic variant of Chicago-style citations (using the biblatex-chicago supplementary package), and did not really test other citation styles. I'll try to take a look later this week.

pauloney commented 10 years ago

Thanks for checking into this. I have a feeling that if we remove all the : everything will continue to be all right - but no one better than you to check that and see if it works!

Paulo Ney

On Mon, Oct 28, 2013 at 12:55 PM, Baldur Kristinsson < notifications@github.com> wrote:

You are most probably right about the punctuation issues. At the time, I was mostly focused on getting things working with an Icelandic variant of Chicago-style citations (using the biblatex-chicago supplementary package), and did not really test other citation styles. I'll try to take a look later this week.

— Reply to this email directly or view it on GitHubhttps://github.com/plk/biblatex/issues/160#issuecomment-27218274 .

plk commented 10 years ago

I have now created a test suite which does exact PDF comparison. It's the "testfull.pl" perl script in the build directory. However, it relies on a reference set of PDFs (also in git now) which of course depend on the version of pdflatex you are using. I noticed that after upgrading pdftex, the tests fails because the tests really test for identical PDF and if any spacing changes, the tests fail. It also potentially fails on pdftex between OSes (Mac vs Windows etc) possibly due to font differences but this could also be due to pdftex version differences. So, basically, if you have a known good reference set of all the biblatex example files (which are in git but generated on my OSX 10.9 with TL 2013), it works. However, you could make some test files and use the same script, adapted, to test lbx output. You just need the Perl CAM::PDF module which does a nice job of comparing PDFs.

pauloney commented 10 years ago

This is indeed nice! I'll make good use of it.

Would you be able to suggest/create an example file that will test the lbx files ?

Paulo Ney

plk commented 10 years ago

Greetings. Sorry for being so quiet about this. I am just about to release 2.9. I'd like to get a sense of where this is now and what the next steps are as it looks like you've done some great stuff here. I had to write a mapping for biber of babel/polyglossia identifiers to standard locales to support locale - specific sorting properly and so this theme is quite current ...

pauloney commented 10 years ago

Hi Philip! It is hard to get a footing since the thread talks about a lot of stuff ... :)

This is what I think the next steps are:

1- We need to make sure the new set of LBX files (the ones I deposited on the Google Drive, above) generate the same exact PDF files. Once that is confirmed we can replace the old LBX set (written by hand) by the new ones outputed from the Database. The problem here is that I cleaned a lot of stuff in the LBX files by hand, so it would be nice to be sure that nothing is broken in the new files, and no new bug has been introduced.

I can do the comparison myself, I just need to learn how to run the test-suite you built.

2- Then I want to extend your test-suite to something that will test every corner of an LBX file, something similar to the HTML-CSS Torture Test.

3- Then I want to make the release of a few new languages.

Let me know how you want to proceed.

PN

pauloney commented 10 years ago

Philip, It would help a lot of the LBX files of the new release where already named by "locale" instead of "languages" ... like in:

american.lbx   --> en_US.lbx
australian.lbx --> en_AU.lbx
austrian.lbx   --> de_AT.lbx
brazilian.lbx  --> pt_BR.lbx
british.lbx    --> en_GB.lbx
canadian.lbx   --> en_CA.lbx

PN

pauloney commented 10 years ago

Philip, I am trying to run the examples and the test script that you wrote, but I am getting pdflatex errors when running the examples 03-, 21-, 22- and 92-.

I made an exact copy of the test files (pdf and all) and when I run "testfull.pl" I get:

paulo@acer:~/BiBLaTeX/biblatex-master/build$ ./testfull.pl 
Checking '01-introduction.pdf'
ok 1 - Page 1
ok 2 - Page 2
Checking '02-annotations.pdf'
Could not create CAM::PDF instance with :  at ./testfull.pl line 21.
# Tests were run but no plan was declared and done_testing() was not seen.
paulo@acer:~/BiBLaTeX/biblatex-master/build$

even though the directories are completely the same ...

PN

plk commented 10 years ago

Ok, firstly, let me think about naming .lbx files by locale. The test stuff requires that you first run the build/build.sh test command to create all PDFs from the current branch. This tests to make sure you can generate every test PDF without error. Then testfull.pl tests to see if the generated PDFs differ from the reference set. The problem is sometimes that they differ sometimes by insignificant whitespace - I usually have to check any variations with a graphical PDF comparison tool. It's a bit of a pain.

pauloney commented 10 years ago

Philip, first let me make my strong arguments for the change:

1- We are not dealing with "languages" per se, but really with "locales" an in programming, specially with difficult stuff like this one - it is extremely important that the representation of the object to be very close to the real thing. Unix and Localization experts have done this for years and the stuff they have developed for locales is very solid and robust programming - that we can draw on.

2- It would make it much easier for Biblatex to talk/interact and cross-polinate with other projects, like for example, CSL:

https://github.com/citation-style-language/locales

3- It would set (and introduce) the standards to be followed by other TeX projects, and put behind us this mess of files named:

portuges.lbx
portuguese.lbx
brazil.lbx
brazilian.lbx

some of them written completely wrong (portuges.lbx) because of impositions of the DOS file system!

The correct way to do this would be to have a file for the language:

en.lbx

and then one for each locale that ensues:

en_US   en_AU   en_GB  en_CA   en_NZ

This would allow us to study the differences form one locale to the next and the similarities to be able to make better translations.

PN

plk commented 10 years ago

I completely agree with you. Can you suggest a mapping from the current .lbx names to proper locale names? I will then look at implementing a mapping within biblatex so that we can then look at integrating your .lbx generation system.

pauloney commented 10 years ago

Philip,

The idea is that we will have 2 types of files in the directory, some labelled after the language and some after the locales, so in the case of Portuguese that would be 3 files:

pt.lang
pt-PT.lbx
pt-BR.lbx

Most of the text would go into the "pt.lang" file and then the minor local differences of Portugal and Brazil would go into the pt-PT and pt-BR files. The "lang" files would be large and the "lbx" small in this naming scheme. You can probably name the language files LBX too ... What LBX stands for anyways ? Latex Biblatex eXchange ?

The scheme for the "locale" file to load the "language" file could continue to use the inter inheritance system used in the lbx file currently:

 \InheritBibliographyExtras{portuguese}
 \InheritBibliographyStrings{portuguese}

but it would be much nicer if it were made internaly - so that every "xx-XX.lbx"file would first load the corresponding "xx.lang" file.

In this scheme, if you re-name the files in the following way, everything is supposed to work as before:

english.lbx     -->   en.lang
UKenglish.lbx   -->   en-UK.lbx
USenglish.lbx   -->   en-US.lbx
american.lbx    -->   en-US.lbx
australian.lbx  -->   en-AU.lbx
british.lbx     -->   en-GB.lbx
canadian.lbx    -->   en-CA.lbx
newzealand.lbx  -->   en-NZ.lbx
german.lbx      -->   de.lang
austrian.lbx    -->   deprecate this file
naustrian.lbx   -->   de-AT.lbx
ngerman.lbx     -->   de-DE.lbx
brazil.lbx      -->   pt-BR.lbx
portuges.lbx    -->   pt-PT.lbx
brazilian.lbx   -->   *** deprecate these two files and I'll send a new one
portuguese.lbx  -->   *** named "pt.lang" that will be a merge of both.
norwegian.lbx   -->   no.lang
norsk.lbx       -->   no-NO.lbx

These are all really language files:

catalan.lbx     -->   ca.lang
croatian.lbx    -->   hr.lang
czech.lbx       -->   cs.lang
danish.lbx      -->   da.lang
dutch.lbx       -->   nl.lang
finnish.lbx     -->   fi.lang
french.lbx      -->   fr.lang
greek.lbx       -->   el.lang
icelandic.lbx   -->   is.lang
italian.lbx     -->   it.lang
nynorsk.lbx     -->   nn.lang
polish.lbx      -->   pl.lang
russian.lbx     -->   ru.lang
slovene.lbx     -->   sl.lang
spanish.lbx     -->   es.lang
swedish.lbx     -->   sv.lang

and then we should create just empty (or quasi-empty) LBX files that would call these last 16 language files. They are:

ca-AD.lbx
hr-HR.lbx
cs-CZ.lbx
da-DK.lbx
nl-NL.lbx
fi-FI.lbx
fr-FR.lbx
el-GR.lbx
is-IS.lbx
it-IT.lbx
nn-NO.lbx
pl-PL.lbx
ru-RU.lbx
sl-SI.lbx
es-ES.lbx
sv-SE.lbx

Paulo Ney

pauloney commented 10 years ago

Philip,

Here are the PT file:

https://drive.google.com/file/d/0B3mOBzjP3W1nN1B0WktpZEhObUE/edit?usp=sharing

Paulo Ney

pauloney commented 10 years ago

One last thing that is important - the parsing of locale files

Most locale files (all the ones in here so far) are of the form:

 xy-ZW

where "xy" stands for the ISO-639-1 two letter code for the language and ZW stands for the ISO 3166-1 two letter code for the country/region.

BUT (big BUT here) the form of the locale names are more general. Firts there are languages that have NOT been classified by ISO-639-1 and they do NOT have a two letter code, only a 3-letter code from ISO-639-2. So here the first token can be two- or three-characaters long.

The other problem are languages that can be written in many different scripts, like for example, Azeri which is usually written in Arabic in Iran, in Cyrillic in Dagestan and Latin characters in Azerbaijan. The three generated locale would be then named:

az-Arab-AZ az-Cyrl-AZ az-Latn-AZ

or things like "nan-Hant-TW" for Min Nan Chinese as spoken in Taiwan using traditional Han characters, "zh-Hans-SG" for Simplified Chinese as spoken in Singapore, etc ...

The same happens to the token for the languages as well. Most of them here to far are a two-letter:

en.lang fr.lang pt.lang

but for languages that could be written in more than one script, they should be identified as:

yi-Hebr.lang yi-Latn.lang

for the Yiddish language written in Hebrew and Latin.

Paulo Ney

plk commented 10 years ago

.lbx was invented by the original biblatex author - the "bx" is "biblatex" and the "l" is "language" I think, just as .cbx is "citations" and .bbx is "bibliography". So we could use .lbx for everything and be safe as it can be "locale/language". I am quite familiar with the ISO format stuff as I once wrote a parser for such lang IDs. In fact, internally biber uses these codes because they are used by the Unicode::Collate module to sort things properly by locale so I already have a mapping in biber from babel/polyglossia IDs to locale IDs like this.

plk commented 10 years ago

We can easily map to new .lbx files using \DeclareLanguageMapping but we should call all language/locale files .lbx. For example en.lbx and en-US.lbx. I think we should make the inheritance explicit with \InheritBibliographyExtras and \InheritBibliographyStrings though - it allows users to read the user-space code more easily.

plk commented 10 years ago

What about migrating this to Perl+SQLite? This way the database can be included in the biblatex distribution and we can have a build script to generate all .lbx files. I assume that the SQL queries are generic enough to port to SQLite so that we can have a simple file store DB?

pauloney commented 10 years ago

Nice to hear that I am preaching to the believer and that you already use the ISO stuff in biber!

Now that I understand what lbx stands for, I see that it is just natural for both types "language" and "locale" file to use the same extension... and having inheritance defined inside the file will help users understand it better and write their own files.

There are 3 components here: the DB - which is in MySQL right now but will convert easily, the Perl that writes the lbx files, and an interface that allows an user to pick 2, 3, 4 languages to display the terms in a pane and enter whatever he wants.... and possibly even save it as an lbx file later.

I wrote the first interface in PHP and I am re-doing it in Rails right now. The DB is larger than Biblatex - it is also able to write files for CSL, Babel and Polyglossia.

It make sense to offer the DB live, so people can enter/fix a translation in Malayalam in India and we can catch it for a new lbx file. You may want to attach it to the Biblatex Project, but it is substantially larger than the project. I have worked very hard on the DB and it has the standard stuff that we need for Biblatex in literally hundreds of languages - from there you can estimate the size.

I can extract only the Biblatex terms and make it a separate tool - smaller in size, but I can also release the whole thing in a new open source project... What are your thoughts on it ?

Paulo Ney

plk commented 10 years ago

Ah, ok, I see now. So, if I can generate a full set of .lbx files at will and if it does more than biblatex anyway, it makes more sense to make it a separate tool.

pauloney commented 10 years ago

The whole idea is to have a cross pollination among the projects so translations that are used by one can be easily imported into another, and someone interested in making a translation for one project will get a few others going with the same effort.

As soon as you make the renaming of the files we should run the test-suite to make sure we are getting exactly the same set of PDF files and nothing has been lost. After that is confirmed, I'll load the slovene.lbx in the DB and produce the first set straight from there... and then we should check again. Then I'll produce a larger set which will probably have to wait for Babel/Polyglossia support.

When the slovene.lbx file showed up mid-Feb I was happy to see that about 80% of that file was already in the DB.

Paulo Ney

plk commented 10 years ago

Actually, you can generate them now - all I have to do is to add some \DeclareLanguageMapping lines to biblatex.def to switch to test this.

pauloney commented 10 years ago

Philip,

I see you are on a roll today! Can we finish this one as well ? If you do your stuff to relabel... I can do the rest - which is probably going to be along the lines:

1- plk does the renaming restructure. 2- test newly generated PDF files to make sure nothing has changed. 3- paulo outputs the set of db-generated lbx files 4- new round of tests to make sure nothing has changed 5- release ...

Paulo Ney

plk commented 10 years ago

All I have to do is add a few definitions to biblatex.def ... where do get the new set of files?

pauloney commented 10 years ago

The files are here:

https://drive.google.com/file/d/0B3mOBzjP3W1naklKdU1wdzBYRVk/edit?usp=sharing

but you should not bother with it, because they will need to incorporate your changes in order to be the final product.

My only problem is that I do not know what these changes are and I'll have to incorporate them in the lbx-writer, but after your move, I'll copy down the changes and it will be easy!

I outline the changes on the message of "24 days ago" (on thsi thread) that starts with:

Philip,

The idea is that we will have 2 types of files in the directory, some labelled after the language and some after the locales, so in the case of Portuguese that would be 3 files....

in there I list the new names of all the files.

Paulo Ney

On Tue, Jun 24, 2014 at 2:24 PM, plk notifications@github.com wrote:

All I have to do is add a few definitions to biblatex.def ... where do get the new set of files?

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47001911.

plk commented 10 years ago

Don't I just need to implement language mappings to find the right files? This is what I would put in biblatex.def which would map the "old" language names to the new file names, for backwards compat. Users could then use the "new" file names or the old. If they use babel/polyglossia, we'll need these mappings until we can persuade those packages to use better names.

\DeclareLanguageMapping{acadian}{fr-CA}
\DeclareLanguageMapping{american}{en-US}
\DeclareLanguageMapping{australian}{en-AU}
\DeclareLanguageMapping{afrikaans}{af-ZA}
\DeclareLanguageMapping{albanian}{sq-AL}
\DeclareLanguageMapping{amharic}{am-ET}
\DeclareLanguageMapping{arabic}{ar-001}
\DeclareLanguageMapping{armenian}{hy-AM}
\DeclareLanguageMapping{asturian}{ast-ES}
\DeclareLanguageMapping{austrian}{de-AT}
\DeclareLanguageMapping{bahasa}{id-ID}
\DeclareLanguageMapping{bahasai}{id-ID}
\DeclareLanguageMapping{bahasam}{id-ID}
\DeclareLanguageMapping{basque}{eu-ES}
\DeclareLanguageMapping{bengali}{bn-BD}
\DeclareLanguageMapping{bgreek}{el-GR}
\DeclareLanguageMapping{brazil}{pt-BR}
\DeclareLanguageMapping{brazilian}{pt-BR}
\DeclareLanguageMapping{breton}{br-FR}
\DeclareLanguageMapping{british}{en-GB}
\DeclareLanguageMapping{bulgarian}{bg-BG}
\DeclareLanguageMapping{canadian}{en-CA}
\DeclareLanguageMapping{canadien}{fr-CA}
\DeclareLanguageMapping{catalan}{ca-AD}
\DeclareLanguageMapping{coptic}{cop}
\DeclareLanguageMapping{croatian}{hr-HR}
\DeclareLanguageMapping{czech}{cs-CZ}
\DeclareLanguageMapping{danish}{da-DK}
\DeclareLanguageMapping{divehi}{dv-MV}
\DeclareLanguageMapping{dutch}{nl-NL}
\DeclareLanguageMapping{english}{en-US}
\DeclareLanguageMapping{esperanto}{eo-001}
\DeclareLanguageMapping{estonian}{et-EE}
\DeclareLanguageMapping{ethiopia}{am-ET}
\DeclareLanguageMapping{farsi}{fa-IR}
\DeclareLanguageMapping{finnish}{fi-FI}
\DeclareLanguageMapping{francais}{fr-FR}
\DeclareLanguageMapping{french}{fr-FR}
\DeclareLanguageMapping{frenchle}{fr-FR}
\DeclareLanguageMapping{friulan}{fur-IT}
\DeclareLanguageMapping{galician}{gl-ES}
\DeclareLanguageMapping{german}{de-DE}
\DeclareLanguageMapping{germanb}{de-DE}
\DeclareLanguageMapping{greek}{el-GR}
\DeclareLanguageMapping{hebrew}{he-IL}
\DeclareLanguageMapping{hindi}{hi-IN}
\DeclareLanguageMapping{ibygreek}{el-CY}
\DeclareLanguageMapping{icelandic}{is-IS}
\DeclareLanguageMapping{indon}{id-ID}
\DeclareLanguageMapping{indonesia}{id-ID}
\DeclareLanguageMapping{interlingua}{ia-FR}
\DeclareLanguageMapping{irish}{ga-IE}
\DeclareLanguageMapping{italian}{it-IT}
\DeclareLanguageMapping{japanese}{ja-JP}
\DeclareLanguageMapping{kannada}{kn-IN}
\DeclareLanguageMapping{lao}{lo-LA}
\DeclareLanguageMapping{latin}{sr-Latn}
\DeclareLanguageMapping{latvian}{lv-LV}
\DeclareLanguageMapping{lithuanian}{lt-LT}
\DeclareLanguageMapping{lowersorbian}{dsb-DE}
\DeclareLanguageMapping{lsorbian}{dsb-DE}
\DeclareLanguageMapping{magyar}{hu-HU}
\DeclareLanguageMapping{malay}{id-ID}
\DeclareLanguageMapping{malayalam}{ml-IN}
\DeclareLanguageMapping{marathi}{mr-IN}
\DeclareLanguageMapping{meyalu}{id-ID}
\DeclareLanguageMapping{mongolian}{mn-Cyrl}
\DeclareLanguageMapping{naustrian}{de-AT}
\DeclareLanguageMapping{newzealand}{en-NZ}
\DeclareLanguageMapping{ngerman}{de-DE}
\DeclareLanguageMapping{nko}{ha-NG}
\DeclareLanguageMapping{norsk}{nb-NO}
\DeclareLanguageMapping{norwegian}{no-NO}
\DeclareLanguageMapping{nynorsk}{nn-NO}
\DeclareLanguageMapping{occitan}{oc-FR}
\DeclareLanguageMapping{piedmontese}{pms-IT}
\DeclareLanguageMapping{pinyin}{pny}
\DeclareLanguageMapping{polish}{pl-PL}
\DeclareLanguageMapping{polutonikogreek}{el-GR}
\DeclareLanguageMapping{portuges}{pt-PT}
\DeclareLanguageMapping{portuguese}{pt-PT}
\DeclareLanguageMapping{romanian}{ro-RO}
\DeclareLanguageMapping{romansh}{rm-CH}
\DeclareLanguageMapping{russian}{ru-RU}
\DeclareLanguageMapping{samin}{se-NO}
\DeclareLanguageMapping{sanskrit}{sa-IN}
\DeclareLanguageMapping{scottish}{gd-GB}
\DeclareLanguageMapping{serbian}{sr-Cyrl}
\DeclareLanguageMapping{serbianc}{sr-Cyrl}
\DeclareLanguageMapping{slovak}{sk-SK}
\DeclareLanguageMapping{slovene}{sl-SI}
\DeclareLanguageMapping{slovenian}{sl-SI}
\DeclareLanguageMapping{spanish}{es-ES}
\DeclareLanguageMapping{swedish}{sv-SE}
\DeclareLanguageMapping{syriac}{syc}
\DeclareLanguageMapping{tamil}{ta-IN}
\DeclareLanguageMapping{telugu}{te-IN}
\DeclareLanguageMapping{thai}{th-TH}
\DeclareLanguageMapping{thaicjk}{th-TH}
\DeclareLanguageMapping{tibetan}{bo-CN}
\DeclareLanguageMapping{turkish}{tr-TR}
\DeclareLanguageMapping{turkmen}{tk-TM}
\DeclareLanguageMapping{ukrainian}{uk-UA}
\DeclareLanguageMapping{urdu}{ur-IN}
\DeclareLanguageMapping{UKenglish}{en-UK}
\DeclareLanguageMapping{uppersorbian}{hsb-DE}
\DeclareLanguageMapping{USenglish}{en-US}
\DeclareLanguageMapping{usorbian}{hsb-DE}
\DeclareLanguageMapping{vietnamese}{vi-VN}
\DeclareLanguageMapping{welsh}{cy-GB}

pauloney commented 10 years ago

Philip,

That looks like it will work. There is still some work on the language names, but we can leave that for later. So I guess the changes are this file you described above and the rename:

english.lbx --> en.lbx UKenglish.lbx --> en-UK.lbx USenglish.lbx --> en-US.lbx american.lbx --> en-US.lbx australian.lbx --> en-AU.lbx british.lbx --> en-GB.lbx canadian.lbx --> en-CA.lbx newzealand.lbx --> en-NZ.lbx german.lbx --> de.lbx naustrian.lbx --> de-AT.lbx ngerman.lbx --> de-DE.lbx brazil.lbx --> pt-BR.lbx portuges.lbx --> pt-PT.lbx norwegian.lbx --> no.lbx norsk.lbx --> no-NO.lbx catalan.lbx --> ca.lbx croatian.lbx --> hr.lbx czech.lbx --> cs.lbx danish.lbx --> da.lbx dutch.lbx --> nl.lbx finnish.lbx --> fi.lbx french.lbx --> fr.lbx greek.lbx --> el.lbx icelandic.lbx --> is.lbx italian.lbx --> it.lbx nynorsk.lbx --> nn.lbx polish.lbx --> pl.lbx russian.lbx --> ru.lbx slovene.lbx --> sl.lbx spanish.lbx --> es.lbx

swedish.lbx --> sv.lbx

Deprecate these 3 files: austrian.lbx brazilian.lbx portuguese.lbx

and then place (almost empty) files

ca-AD.lbx hr-HR.lbx cs-CZ.lbx da-DK.lbx nl-NL.lbx fi-FI.lbx fr-FR.lbx el-GR.lbx is-IS.lbx it-IT.lbx nn-NO.lbx pl-PL.lbx ru-RU.lbx sl-SI.lbx es-ES.lbx sv-SE.lbx

that will have a single line calling the language lbx file, and we should be done.

Paulo Ney

pauloney commented 10 years ago

and if you could show me how to generate the full set of examples files and compare them to a previously generated set - I would love it!

I want to build an "acid-test-suite" for the LBX files and not only be able to look at the results but also to be able to compare to a previous run of biblatex.

Paulo Ney

pauloney commented 10 years ago

and then when you are finished I have questions about the following entries on the file:

\DeclareLanguageMapping{arabic}{ar-001} \DeclareLanguageMapping{esperanto}{eo-001}

\DeclareLanguageMapping{pinyin}{pny} \DeclareLanguageMapping{syriac}{syc}

\DeclareLanguageMapping{serbian}{sr-Cyrl} \DeclareLanguageMapping{serbianc}{sr-Cyrl}

\DeclareLanguageMapping{thai}{th-TH} \DeclareLanguageMapping{thaicjk}{th-TH}

\DeclareLanguageMapping{british}{en-GB} \DeclareLanguageMapping{scottish}{gd-GB} \DeclareLanguageMapping{welsh}{cy-GB}

\DeclareLanguageMapping{UKenglish}{en-UK}

and from the types of entries you can imagine the type of question I have ... but we leave that for later ... after the renaming ...

Paulo Ney

pauloney commented 10 years ago

So here are the strings that I see a small problem:

\DeclareLanguageMapping{arabic}{ar-001} \DeclareLanguageMapping{esperanto}{eo-001}

\DeclareLanguageMapping{pinyin}{pny} \DeclareLanguageMapping{syriac}{syc}

I am not sure what the "-001" means, but I imagine that the problem here is the same thing for all four - one language that is spoken in several geographically distinct places but the package does not know anything about the local differences in between them and is treating them as ONE entity.

I am also assuming that the two last ones would be mapping into a file like "pny.lbx" and "syc.lbx". We should do this to the first two languages as well like in:

\DeclareLanguageMapping{arabic}{ar} \DeclareLanguageMapping{esperanto}{eo}

\DeclareLanguageMapping{british}{en-GB} \DeclareLanguageMapping{scottish}{gd-GB} \DeclareLanguageMapping{welsh}{cy-GB} \DeclareLanguageMapping{UKenglish}{en-UK}

The name of the "sovereign" entity here is "United Kingdom of Great Britain and Northern Ireland", but contrary to many believe, the ISO-3166 code is "GB" and not "UK". The token "uk" is present in a lot of places because the creation of the top-level domain .uk. The creation of the domain pre-dated the ISO-3166 standard, and stands there for historical reasons.

Since we are using standards, the best choice here for the last line would be:

\DeclareLanguageMapping{UKenglish}{en-GB} \DeclareLanguageMapping{en-UK}{en-GB}

and the file to be called "en-GB.lbx".

plk commented 10 years ago

Ok fair enough, I'll change them to do this. So what to do now? I have a dev branch of biblatex with these definitions in there - do you need to now generate the .lbx files with these names?

pauloney commented 10 years ago

Here are the files renamed.

https://drive.google.com/file/d/0B3mOBzjP3W1nbzhfRFlCY0NvMnc/edit?usp=sharing

The only things I am not sure about are the command:

\DeclareRedundantLanguages{}{}

and I left them as-is. You can load them on the directory and run the tests.

Paulo Ney

plk commented 10 years ago

I have made a special release 3.0 to look at this. You can find it here:

https://www.dropbox.com/sh/ld4qx1go9ry3gbz/AAAR-YYqz8Q0Wfy4A_b5niE6a

It uses your files, the mapping definitions and some code changes required to avoid spurious language warnings. I tested it a little and it seems to work fine so far. To really test it requires that you delete the files in your biblatex lbx directory before install as an install probably won't delete these so they'll still be found. This is going to be a problem for upgrading to these files in general ... the old ones will still be there and the code will find them.

pauloney commented 10 years ago

Philip,

The code is still referring to the old file names ? Or just to the old language-names? I don't understand why would that be needed ... some backwards compatibility thing?

Paulo Ney

On Thu, Jun 26, 2014 at 10:36 AM, plk notifications@github.com wrote:

I have made a special release 3.0 to look at this. You can find it here:

https://www.dropbox.com/sh/ld4qx1go9ry3gbz/AAAR-YYqz8Q0Wfy4A_b5niE6a

It uses your files, the mapping definitions and some code changes required to avoid spurious language warnings. I tested it a little and it seems to work fine so far. To really test it requires that you delete the files in your biblatex lbx directory before install as an install probably won't delete these so they'll still be found. This is going to be a problem for upgrading to these files in general ... the old ones will still be there and the code will find them.

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47255489.

plk commented 10 years ago

Actually, ignore that. The code changes mean the old files will not be used but they will still be in the installed biblatex tree which might confuse some people.

pauloney commented 10 years ago

Cool!

Where you able to verity that the files generated now are the same as the ones before ?

Can you give me some guidance on a couple of related questions ?

How can you run the test-sequence ?

What is the best way to have two parallel installations of biblatex ?

Paulo Ney

On Thu, Jun 26, 2014 at 11:00 AM, plk notifications@github.com wrote:

Actually, ignore that. The code changes mean the old files will not be used but they will still be in the installed biblatex tree which might confuse some people.

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47258145.

plk commented 10 years ago

I found a problem - the en-US.lbx contains:

\InheritBibliographyExtras{english}
\DeclareBibliographyExtras{\uspunctuation}
\InheritBibliographyStrings{english}

but shouldn't this be en instead of english? The problem is that biblatex goes into an endless loop since english maps to en-US ...

In general, we don't want to refer to any of the old files from the new ones or we'll get loops due to the language mapping macros.

Parallel installs would be possible by installing to some special place and then pointing one of the TEX related environment variables (like TEXMFHOME or TEXMFLOCAL) as I remember.

pauloney commented 10 years ago

Yes! This needs to be fixed, the problem is present in the following files:

en-AU.lbx:\InheritBibliographyStrings{english} en-CA.lbx:\InheritBibliographyStrings{english} en-GB.lbx:\InheritBibliographyStrings{english} en-NZ.lbx:\InheritBibliographyStrings{english} en-US.lbx:\InheritBibliographyStrings{english}

and

en-AU.lbx:\InheritBibliographyExtras{british} en-CA.lbx:\InheritBibliographyExtras{english}% correct? these are the US standards en-NZ.lbx:\InheritBibliographyExtras{british} en-US.lbx:\InheritBibliographyExtras{english}

and these lines should be deleted:

en.lbx:\InheritBibliographyStrings{en} nn.lbx:\InheritBibliographyStrings{norwegian} nn.lbx:\InheritBibliographyExtras{norwegian}

Sorry!

Paulo Ney

On Thu, Jun 26, 2014 at 11:15 AM, plk notifications@github.com wrote:

I found a problem - the en-US.lbx contains:

\InheritBibliographyExtras{english} \DeclareBibliographyExtras{\uspunctuation} \InheritBibliographyStrings{english}

but shouldn't this be en instead of english? The problem is that biblatex goes into an endless loop since english maps to en-US ...

In general, we don't want to refer to any of the old files from the new ones or we'll get loops due to the language mapping macros.

Parallel installs would be possible by installing to some special place and then pointing one of the TEX related environment variables (like TEXMFHOME or TEXMFLOCAL) as I remember.

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47259875.

plk commented 10 years ago

No problem, that's what testing is for ...

pauloney commented 10 years ago

What about this command:

\DeclareRedundantLanguages{english,american}{english,american,british,...

in "en.lbx" ? Would that also generate these infinite loops ?

And what about these 3:

de-AT.lbx: inherit = {de}, de-DE.lbx: inherit = {de}, nn.lbx: inherit = {norwegian},

The last line is completely wrong - nn is for Nynorsk and no is for Norsk which is normally translated as Norwegian. I did not delete because I wanted to have confirmation from a native speaker first.

Paulo Ney

On Thu, Jun 26, 2014 at 11:15 AM, plk notifications@github.com wrote:

In general, we don't want to refer to any of the old files from the new ones or we'll get loops due to the language mapping macros.

plk commented 10 years ago

\DeclareRedundantLanguages is ok - it deals with removing unneeded language fields from entries and it only cares about babel/polyglossia language names.

I think the german ones are ok, not sure about the nn. I can test german one when you regenerate.

pauloney commented 10 years ago

New files here:

https://drive.google.com/file/d/0B3mOBzjP3W1nMnFZOFlxUUtWWnc/edit?usp=sharing

Paulo Ney

On Thu, Jun 26, 2014 at 11:52 AM, plk notifications@github.com wrote:

\DeclareRedundantLanguages is ok - it deals with removing unneeded language fields from entries and it only cares about babel/polyglossia language names.

I think the german ones are ok, not sure about the nn. I can test german one when you regenerate.

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47264556.

plk commented 10 years ago

Now the basic tests are ok (can't run the other tests until the basic tests pass) apart from one which tests localisations which gives:

Package biblatex Warning: Bibliography string 'langslovene' undefined on input

a missing string?

pauloney commented 10 years ago

Let me check this!

Paulo Ney

On Fri, Jun 27, 2014 at 3:24 AM, plk notifications@github.com wrote:

Now the basic tests are ok (can't run the other tests until the basic tests pass) apart from one which tests localisations which gives:

Package biblatex Warning: Bibliography string 'langslovene' undefined on input

a missing string?

— Reply to this email directly or view it on GitHub https://github.com/plk/biblatex/issues/160#issuecomment-47311600.

plk / biblatex

Use standardized language identifiers for lbx files #160

swedish.lbx --> sv.lbx