Create a few small LBX files

pauloney commented 3 weeks ago

I have to create some LBX files (Latin, Arabic, Hebrew, Vietnamese, Chinese, Japanese and Korean). They will be fairly small, I only need the snippets:

page
pages
p.
pp.
cited on page 15
cited on pages 15, 17, 19
cit. on p. 15
cit. on pp. 15, 17, 19

and I sort of have that worked out by a few native speakers:

Latin:

pagina
paginae
p.
pp.
citatis pag 15
citatis pag. 15, 17, 19 .
cit. supra p. 15
cit. s., pp. 15, 17, 19

Arabic:

ﺎﻠﺼﻔﺣﺓ
ﺎﻠﺼﻔﺣﺎﺗ
ﺹ.
ﺹ.
ﻢﻘﺘﺒﺳ ﻒﻳ ﺎﻠﺼﻔﺣﺓ 15
ﻢﻘﺘﺒﺳ ﻒﻳ ﺎﻠﺼﻔﺣﺎﺗ 15، 17، 19
ﻢﻘﺘﺒﺳ ﻒﻳ ﺎﻠﺼﻔﺣﺓ 15
ﻢﻘﺘﺒﺳ ﻒﻳ ﺎﻠﺼﻔﺣﺎﺗ 15، 17، 19

Hebrew:

עַמוּד
דפים
ע.
עמ.
מצוטט בעמוד 15
מצוטט בעמודים 15, 17, 19
cit. בעמוד. 15
cit. בעמ' 15, 17, 19

Vietnamese:

trang
trang
trang
trang
trang
trích dẫn ở trang 15
trích dẫn ở trang 15, 17, 19
trích dẫn ở trang 15
trích dẫn ở trang 15, 17, 19

Chinese:

頁
頁面
p。
頁數
第15頁引用
第 15、17、19 頁引用
引用。上頁。 15
引用。第 15、17、19 頁

Japanese:

ページ
ページ
p.
pp.
15 ページに引用
15、17、19 ページに引用
15 ページに引用
15、17、19 ページに引用

Korean:

페이지
페이지
p.
pp.
인용 페이지 15
인용 페이지 15, 17, 19
인용 페이지 15
인용 페이지 15, 17, 19

I plan to write these small files and place them on my working directory. I got a few piece of it working already and I see the challenges being:

1- The different comma of Arabic, Chinese and Japanese. 2- The different dot of Chinese. 3- Position of the enumeration within the sentence (or snippet).

Are there any information on how to build the LBX files that I can follow?

zepinglee commented 3 weeks ago

Chinese:

頁
頁面
p。
頁數
第15頁引用
第 15、17、19 頁引用
引用。上頁。 15
引用。第 15、17、19 頁

The Chinese translations look strange to me as a native Chinese speaker.

The Chinese characters are in traditional form (e.g., 頁) which is mainly used in Hong Kong and Taiwan. Is your work primarily aimed at these regions? The majority in mainland China use simplified Chinese (页).
In Chinese, it doesn't make sense to put a dot (whether . or 。) after an abbreviated word. I suggest using the 頁 for p. or directly the English term.
There are usually no plural forms in Chinese. The current 頁數 for pp. actually means "number of pages".

The following are my suggested translations.

頁
頁
頁
頁
第 15 頁引用
第 15、17、19 頁引用
第 15 頁引用
第 15、17、19 頁引用

pauloney commented 3 weeks ago

Many thanks for this. I appreciate it.

One question: The LBX files use "p." to refer to an specifig page and "pp." for the total number of pages in a book (source). So from what you saying we should use:

page = {{頁}{頁}}, pages = {{頁數}{頁數}},

that is, for this term, use the initial proposal. Correct?

Another question is: What is the difference between:

在第15頁引用   and    第15頁引用

You seems to be interested in Biblatex development, why not give a hand to get thiese two LBX files out?

pauloney commented 3 weeks ago

I am done with Latin, Vietnamese and Korean. Need to figure out how to place a a token (number) before the LBX string.

zepinglee commented 3 weeks ago

One question: The LBX files use "p." to refer to an specifig page and "pp." for the total number of pages in a book (source). So from what you saying we should use:

page = {{頁}{頁}}, pages = {{頁數}{頁數}},

that is, for this term, use the initial proposal. Correct?

No, the pages term is used for plural specific pages (e.g., (Doe, 2000, pp. 42-44)). The terms for total number of pages are defined with pagetotal and pagetotals in LBX file. In this case, the Chinese term is still 頁, which means 256 pp. <=> 256 頁.

  page             = {{頁}{頁}},
  pages            = {{頁}{頁}},
  pagetotal        = {{頁}{頁}},
  pagetotals       = {{頁}{頁}},

Another question is: What is the difference between:
在第15頁引用   and    第15頁引用

The former has an additional preposition "在" meaning "on" (page 15). Both phrases make sense in Chinese but the latter is preferred in formal writing language style, especially as a notation of a citation rather than in plain sentence. There are more formal phrases with same meaning: 引用於第 15 頁, 引用於頁 15。

moewew commented 3 weeks ago

Changing the order of field content and localisation string is painful, but doable. We had to do it for Hungarian for example. Have a look at https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/lbx/magyar.lbx (and if you like to dig through the history, also https://github.com/plk/biblatex/issues/717 and https://github.com/plk/biblatex/pull/780 as well as linked discussions).

See https://tex.stackexchange.com/q/724962.

If you do end up asking about issues you brought up here elsewhere (e.g. on a forum) or vice versa, please add links to the relevant discussions on both sides so that people interested in the issue can follow all discussions.

moewew commented 3 weeks ago

BTW: We're always interested in more localisation files, so if you can help us make biblatex available in more languages, do let us know. We'll try to help where we can (it sometimes turns out that the usual .lbx framework is not enough and that we need to do additional stuff to get good output).

That said, I'd really rather prefer to work together directly with a native speaker (or a near-native speaker) on this, since we might have to discuss subtle language details and typographical issues. Having to go through several people or having to rely on colleagues, translation tools etc. is probably not going to cut it for that. (I think I remember a case where we accepted machine-translated versions of a couple of new language strings and they turned out to be not that good in some languages. So I really want things vetted by someone who speaks the language.)

Additionally, I think it only makes sense to add localisations if they are complete enough and if we can support good-looking output in the target language. At the moment I don't think we can really do RTL typesetting (https://github.com/plk/biblatex/issues/1139), so Arabic and Hebrew localisations are still a long way away.

pauloney commented 3 weeks ago

BTW: Good that you bring this subject up. Our coverage in terms of localization is pretty uneven, we have several languages with less than 1m speakers (basque, estonian, icelandic ...) and no coverage for cjk, arabic, hindi, bengali, vietnamese, bahasa,... On the list of the 10 largest languages we only have four (english, russian, spanish and portuguese). I have been thinking of doing something about that for quite a while now. Bridging the gap between the groups of native speakers, latex programmers and interested users has not been easy. Most of the people that have the skills in a particular language have never heard of an LBX file.

Working with an LBX file is not easy either. Most of the work is done by picking up ONE file (like english) and replacing the strings, while the ideal would be to pick up english (as a reference) and several nearby language files for comparison. So, for example, someone working on Hindi, Varhadi, Kannada,Telugu, Gujarati, Konkani, ... would pick up the Marathi list as a helper because many of the letters and words are the same.

Also, authors of the language translation should have immediate access to the use of each string because of considerations of gender, plural, etc ... and that is hard, if you are not versed in TeX.

So, I have been thinking about writing a web-tool that would help with this process. Allowing one to pull out some two or three languages together in a spreadsheet format list, read them from the standard LBX files, and write them back as the user progresses. It can easily be shared with others working on the same language and provide examples on the go. Results can be fed back to biblatex, text2bib, babel and polyglossia. LBX files are starting to be used outside (text2bib.org for example) and this can probably help us acquire more files.

I imagine that you do not want any part of that ... and just would want a ready file with the strings -- is that correct? If so, I will start a separate project on GitHub.

And speaking about that, consider that there are 4k written languages, in 293 scripts, 154 of them on Unicode, out of which 150 are supported by the NoTO fonts. Can we make Biblatex BCP-47 compliant before we walk into a mess?

On Fri, Aug 23, 2024 at 2:02 AM moewew @.***> wrote:

BTW: We're always interested in more localisation files, so if you can help us make biblatex available in more languages, do let us know. We'll try to help where we can (it sometimes turns out that the usual .lbx framework is not enough and that we need to do additional stuff to get good output).

That said, I'd really rather prefer to work together directly with a native speaker (or a near-native speaker) on this, since we might have to discuss subtle language details and typographical issues. Having to go through several people or having to rely on colleagues, translation tools etc. is probably not going to cut it for that. (I think I remember a case where we accepted machine-translated versions of a couple of new language strings and they turned out to be not that good in some languages. So I really want things vetted by someone who speaks the language.)

Additionally, I think it only makes sense to add localisations if they are complete enough and if we can support good-looking output in the target language. At the moment I don't think we can really do RTL typesetting (

1139 https://github.com/plk/biblatex/issues/1139), so Arabic and

Hebrew localisations are still a long way away.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1374#issuecomment-2306636822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYU4T6DDKK2NC3ZLGY3ZS33D5AVCNFSM6AAAAABM4VTJJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBWGYZTMOBSGI . You are receiving this because you authored the thread.Message ID: @.***>

jbezos commented 2 weeks ago

With babel I’ve followed the following strategy. On the one hand, for ldf files I have created an html form available here: https://github.com/latex3/babel/tree/main/tools (can be seen in action in Language incubator for babel). On the other hand, with ini files when a value does not exist a message like the following is displayed:

Package babel Warning: \chaptername not set for 'mylang'. Please,
(babel)                define it after the language has been loaded
(babel)                (typically in the preamble) with:
(babel)                \setlocalecaption{mylang}{chapter}{..}
(babel)                Feel free to contribute on github.com/latex3/babel.
(babel)                Reported on input line 26.

Furthermore, besides the (more or less) 300 locales distributed with babel there are about 400 ‘templates’ here. This (I think) makes the language easily useable and encourages contributions, and saves contributors from having to think about things like what is the correct BCP-47 code, or the recommended language name, or the OpenType script code. And of course, it helps in one of my main goals – language diversity.

pauloney commented 2 weeks ago

This is great Javier. Editing an lbx file can be intimidating for the non-initiated.

My suggestion for improvements would be:

1- The ability to pull the list in English and another "nearby" language. For example, someone that would like to work on Kannada or Telugu, could pull out the lists of English and Marathi, so that he will have a nearby language as a reference.

2- A mouse-over with examples of use would also be a nice touch. It is not easy to understand "cc" or "encl" if you are on a far out language.

It is also nice to see Bebel moving more aggressively to BCP-47. We need that to be able to build nice spellers, etc ...

On Thu, Aug 29, 2024 at 11:59 PM Javier Bezos @.***> wrote:

With babel I’ve followed the following strategy. On the one hand, for ldf files I have created an html form available here: https://github.com/latex3/babel/tree/main/tools (can be seen in action in Language incubator for babel https://www.texnia.com/incubator.html). On the other hand, with ini files when a value does not exist a message like the following is displayed:

Package babel Warning: \chaptername not set for 'mylang'. Please, (babel) define it after the language has been loaded (babel) (typically in the preamble) with: (babel) \setlocalecaption{mylang}{chapter}{..} (babel) Feel free to contribute on github.com/latex3/babel. (babel) Reported on input line 26.

Furthermore, besides the (more or less) 300 locales distributed with babel there are about 400 ‘templates’ here https://github.com/latex3/babel/tree/main/locale-templates. This (I think) makes the language easily useable and encourages contributions, and saves contributors from having to think about things like what is the correct BCP-47 code, or the recommended language name, or the OpenType script code. And of course, it helps in one of my main goals – language diversity.

— Reply to this email directly, view it on GitHub https://github.com/plk/biblatex/issues/1374#issuecomment-2320269176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7WYT5BPO45UJMKMBKTUTZUAJ4VAVCNFSM6AAAAABM4VTJJSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRQGI3DSMJXGY . You are receiving this because you authored the thread.Message ID: @.***>

jbezos commented 2 weeks ago

It is also nice to see Bebel moving more aggressively to BCP-47.

But with caution. BCP-47 is not just about tags, but also about lookups and fallbacks, and also about how they are applied in practice (particularly in the Unicode CLDR).

Can we make Biblatex BCP-47 compliant before we walk into a mess?

If made, I’d recommend not to use the same command/macro/... for the locale name and the tag. They should be clearly separated. Well, in babel they are not, but it was a clear mistake. Fortunately, BCP-47 tags are allowed only if explicitly requested with an option in \babeladjust. Now I’m investigating what to do (an optional argument in selectors like \selectlanguage? new macros? a new option in \babeladjust to force the tags?...).

@moewew With relation with RTL scripts (actually any script, including CJK or Devanagari), please, feel to to ask for help with relation to babel (or @pauloney, of course). I’m very interested in its integration with automated worksflows in the creation of documents, and biblatex is clearly relevant here.

plk / biblatex

Create a few small LBX files #1374

1139 https://github.com/plk/biblatex/issues/1139), so Arabic and