metanorma / pubid-itu

Parser for ITU-T and ITU-R publication identifiers
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Internationalisation for ITU identifiers #43

Open mico opened 3 months ago

mico commented 3 months ago

You have not implemented internationalisation. I refer you again to: https://github.com/metanorma/pubid-itu/issues/39#issuecomment-1966155108, and https://www.itu.int/pub/T-SP-OB.1283-2024 : go to the annex for each language, and see the footer for the desired abbreviated form.

You have:

let(:params) { { type: :annex, base: Identifier.create(sector: "T", series: "OB", number: 1) } }

          it "renders annex to identifier" do
            expect(subject.to_s).to eq("Annex to ITU-T OB.1")
          end

I would like to be able to do the following:

Given an additional parameter i18n-lang, which defaults to "en"

Pubid::Itu::Identifier.create(**{type: :annex, 
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "Annex to ITU OB 1283-E"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "en",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "Annex to ITU OB 1283-E "

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "fr",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "Annexe au BE de l'UIT 1000"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "zh-Hans",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "国际电联第1283期《操作公报》附件"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "ar",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "ملحق ابلنشرة التشغيلية رقم 1283"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "es",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "Anexo al BE de la UIT N.º 1283"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "ru",
  base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == 
  "Приложение к ОБ 1283 МСЭ"

I do not have confidence that you will implement this, so I will defer this requirement unless @ronaldtse says otherwise.

Originally posted by @opoudjis in https://github.com/metanorma/pubid-itu/issues/39#issuecomment-2302586222

opoudjis commented 3 months ago

@ronaldtse has confirmed that this is needed...

mico commented 2 months ago

@opoudjis In pubid-iso we are doing:

pubid.to_s(lang: :russian)
=> Руководство ИСО/МЭК 76

pubid.to_s
=> ISO/IEC Guide 76

We can use the same approach, so rendering language defined on rendering stage, not on creation stage like in examples above.

mico commented 2 months ago

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "fr", base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == "Annexe au BE de l'UIT 1000"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "zh-Hans", base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == "国际电联第1283期《操作公报》附件"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "ar", base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == "ملحق ابلنشرة التشغيلية رقم 1283"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "es", base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == "Anexo al BE de la UIT N.º 1283"

Pubid::Itu::Identifier.create(**{type: :annex, i18n-lang: "ru", base: Pubid::Itu::Identifier.create(sector: "T", series: "OB", number: 1283)}) == "Приложение к ОБ 1283 МСЭ"

@opoudjis where did you get these translations? Did you make them by yourself?

mico commented 2 months ago

For most of the identifiers I don't see translated version of identifier to another languages. Some examples of what I found currently:

Identifier ITU-T Rec. G.729/Annex J (05/2006) in Spanish version represented as Rec. UIT-T G.729/anexo J (05/2006) in French it is Rec. UIT-T G.729/Annexe J (05/2006)

For other types of identifiers, it's something like: ITU-T Z.100 App. II (03/1993) - Recomendación Z.100 – Apéndice II (03/93) (Spanish, I don't think we can use it as pubid) For ITU-T T.4:

Identifier ITU-T G.729 Annex E (1998) Cor. 1 (02/2000) and other corrigenda for ITU-T G.729 don't have clear representation in other languages.

ITU-T A Suppl. 2 (12/2022) in Spanish document footer looks like Serie A – Suplemento 2 (12/2022) (I don't think we can consider as pubid identifier's representation)

ITU-T E.156 Suppl. 2:

ITU-T G.780/Y.1351 (2004) Amend. 1:

ITU-T M.3016.1:

ITU-R SA.364-6:

mico commented 2 months ago

@opoudjis @ronaldtse Identifier Рек. МСЭ-Т T.4 in sector part the T character used is Russian Т For Рек. МСЭ-R SA.364-6 in sector part, obviously character R is English character. I'm not sure if we should follow this pattern.

Either we should change Рек. МСЭ-R SA.364-6 to Рек. МСЭ-Р SA.364-6 or update Рек. МСЭ-Т T.4 to Рек. МСЭ-T T.4 (using English T character)

opoudjis commented 2 months ago

In МСЭ-Т, T stands for Telecommunications, not электросвязи. The use of a Cyrillic T is therefore editorial inconsistency. That inconsistency is replicated in https://www.itu.int/ru/ITU-T/about/Pages/default.aspx

Коротко об МСЭ-Т (Roman T)

Исследовательские комиссии Сектора стандартизации электросвязи МСЭ (МСЭ-Т) Roman T объединяют экспертов со всего мира, чтобы разрабатывать международные стандарты, известные как Рекомендации МСЭ-T Cyrillic T

We at Metanorma are often confronted with editorial inconsistency on the part of SDOs.

And we do not replicate errors. Make it Roman T throughout.

mico commented 2 months ago

@opoudjis @ronaldtse For recommendation identifiers sector and series didn't change during translation: ITU-T M.3016.1 -> Рек. МСЭ-Т M.3016.1 (For Russian) ("T" sector and "M" series) But for Annexes for Operational Bulletin everything translated and format is changed completely. Annex to ITU OB 1283-E -> Приложение к ОБ 1283 МСЭ "OB" series become "ОБ" and instead of doing "МСЭ-ОБ 1283" now we have it in another format: "ОБ 1283 МСЭ"

Chinese version (国际电联第1283期《操作公报》附件) using long version of "OB" (操作公报 -> Operational Bulletin)

I believe identifiers in this format is unacceptable as "pubid" representation (it should have clear, parseable format), only as title for PDF documents. Should we render titles in this "long" format only when "long" format is requested?

pubid.to_s(format: long, language: :ru)
=> "Приложение к ОБ 1283 МСЭ"

But by default render in more standardised, parseable version?

pubid.to_s(language: :ru)
=> "Приложение к МСЭ-T OB No. 1283"
opoudjis commented 2 months ago

I'm ok with a long and a short version.