Bug: ODT exporter does not honour +LANGUAGE line [9.1.9 (release_9.1.9-65-g5e4542 @ /usr/share/emacs/26.1/lisp/org/)]

kjambunathan commented 4 years ago

See https://lists.gnu.org/archive/html/emacs-orgmode/2020-08/msg00140.html

Dear maintainer,

the ODT exporter always exports as English, even if a document is not written in English and contains a corresponding +LANGUAGE line. Take for example this German document:

#+AUTHOR: Ich
#+TITLE: Test
#+LANGUAGE: de

Dies ist ein Test.

When it is exported as ODT, the main language is set to English. As a result, the office application's spell checker (LibreOffice's one in my case) highlights everything as wrong that isn't accidentally a valid word in English. To fix that one has to manually change the document's main language to German (in this case) via the office application's menus. This is annoying, especially if the intention is to automatically further convert to DOCX and send the document out unmodified. The HTML exporter on the other hand handles the +LANGUAGE line just fine.

-quintus

Emacs : GNU Emacs 26.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.5) of 2019-09-23, modified by Debian Package: Org mode version 9.1.9 (release_9.1.9-65-g5e4542 @ /usr/share/emacs/26.1/lisp/org/)

kjambunathan commented 3 years ago

the ODT exporter always exports as English, even if a document is not written in English and contains a corresponding +LANGUAGE line. Take for example this German document:
#+AUTHOR: Ich
#+TITLE: Test
#+LANGUAGE: de

Dies ist ein Test.

Hello. @ouboub, @QiangF,

I will be adding support for honouring #+LANGUAGE: line in a few hours (or days). I have a working protoype. I need to cleanup the stuff a bit.

Between us we wil be able to test drive European, Chinese and Indian languages. IOW, we can test latin, indic and asian languages.

ouboub commented 3 years ago

very good, thanks! I could test Hebrew as well, if you are willing to implement that as well.

kjambunathan commented 3 years ago

Support for LANGUAGE keyword is now available as part of ox-odt-9.3.7.328.tar.

Here is a preview of the newly added features:

Support for completion on #+LANGUAGE: lines.

In the screenshot below I am in the middle of filling ger, presumaly german language or locale.

Screenshot from 2021-10-10 18-38-05

Body Text in LibreOffice gets the right LANGUAGE so that spelll checking etc work just fine

You can see the language of the text in LibreOffice's modeline.

Screenshot from 2021-10-10 18-09-50

Also don't forget to read the inline comments in org snippet below

Appendix

# -*- coding: utf-8; -*-

Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages.  They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].

# When on ~#+LANGUAGE: ~ line, press TAB for inputting the
# locale/language.

# Org's ~org-export-dictionary~ prefers just the language code (devoid
# of any country code).  So, in the ~*Completions*~ buffer choose
# language, and if that is insufficient to get the translation you want
# choose language /and/ country.

# That said, some remarks ...

# In case of Chinese and Brazilian Portuguese, the Org's dictionary
# uses both the language /and/ country codes.  The dictionary adopts
# /no/ specific policy on how to separate language and country codes.
# For example, the Chinese entries use ~hyphen~ as separator, but the
# Brazilian Portuguese entry uses ~underscore~ as separator.  The
# convention used in Debian, specifically the
# ~/usr/share/i18n/SUPPORTED~ file (bundled with the ~locales~
# package), is to use ~underscore~ as separator between language and
# country code.  Also, in case of Norwegian, Org's dictionary has 3
# entries ~nn~, ~no~ and ~nb~.  But, I see ~nb_NO~ and ~nn_NO~ but
# /no/ ~no_NO~ in ~/usr/share/i18n/SUPPORTED~.  Note that ~no~ does
# appear as Norwegian in ~iso_639-2~ and ~iso-639-3~ list.

# May be it is time that Org maintainers settle on a consistent syntax
# for the entries in ~org-export-dictionary~.

* Arabic

#+language: ar_OM

أنا قادر على أكل الزجاج و هذا لا يؤلمني.

* COMMENT Chinese (simplified)

#+language: zh

我能吞下玻璃而不伤身体。

* COMMENT English

# #+language: en_IN

I can eat glass and it doesn't hurt me.

* COMMENT Hebrew

#+language: he

אני יכול לאכול זכוכית וזה לא מזיק לי

* COMMENT Hindi

#+language: hi

मैं काँच खा सकता हूँ और मुझे उससे कोई चोट नहीं पहुंचती

* COMMENT Japanese

#+language: ja

私はガラスを食べられます。それは私を傷つけません。

* COMMENT Korean

#+language: ko

나는 유리를 먹을 수 있어요. 그래도 아프지 않아요

* COMMENT German

#+language: de_CH

Ich kann Glas essen, ohne mir zu schaden.

* COMMENT Spanish

#+language: es_AR

Puedo comer vidrio, no me hace daño.

* COMMENT Tamil

#+language: ta_IN

நான் கண்ணாடி சாப்பிடுவேன், அதனால் எனக்கு ஒரு கேடும் வராது.

kjambunathan commented 3 years ago

Probably one of you , at your own discretion (and if you consider it worth your time) can take up my observations about LANGUAGE line to emacs-orgmode maintainers.

Org's org-export-dictionary prefers just the language code (devoid of any country code). So, in the *Completions* buffer choose language, and if that is insufficient to get the translation you want choose language and country.

That said, some remarks ...

In case of Chinese and Brazilian Portuguese, the Org\'s dictionary uses both the language and country codes. The dictionary adopts no specific policy on how to separate language and country codes. For example, the Chinese entries use hyphen as separator, but the Brazilian Portuguese entry uses underscore as separator. The convention used in Debian, specifically the /usr/share/i18n/SUPPORTED file (bundled with the locales package), is to use underscore as separator between language and country code. Also, in case of Norwegian, Org\'s dictionary has 3 entries nn, no and nb. But, I see nb_NO and nn_NO but no no_NO entry in /usr/share/i18n/SUPPORTED. Note that no does appear as Norwegian in iso_639-2 and iso-639-3 list.

May be it is time that Org maintainers settle on a consistent syntax for the entries in org-export-dictionary.

Comment at and around https://github.com/kjambunathan/org-mode-ox-odt/blob/286a9b789db842cfa3a2fa68e4383bf6101da10c/lisp/ox-odt.el#L714 is a good starting point for further exploration.

ouboub commented 3 years ago

Hi

thanks excellent I can confirm german, spanish, hebrew, english_US but I fail with british, shouldn't that be english_GB? I will later read your comments and try to act upon them. BTW the sentence you picked up locks strange, but so be it.

kjambunathan commented 3 years ago

I can confirm german, spanish, hebrew, english_US but I fail with british, shouldn't that be english_GB?

Thanks.

If you choose English (United Kingdom) in completion table you get en-GB.

LibreOffice's mode-line also prefers English (UK) to using English (GB).

BTW the sentence you picked up locks strange, but so be it.

From https://kermitproject.org/utf8.html#notes

The \"I can eat glass\" phrase and initial translations (about 30 of them) were borrowed from Ethan Mollick\'s I Can Eat Glass page (which disappeared on or about June 2004) and converted to UTF-8. Since Ethan\'s original page is gone, I should mention that his purpose was to offer travelers a phrase they could use in any country that would command a certain kind of respect, or at least get attention. See Credits for the many additional contributions since then. When submitting new entries, the word \"hurt\" (if you have a choice) is used in the sense of \"cause harm\", \"do damage\", or \"bother\", rather than \"inflict pain\" or \"make sad\". In this vein Otto Stolz comments (as do others further down; personally I think it\'s better for the purpose of this page to have extra entries and/or to show a greater repertoire of characters than it is to enforce a strict interpretation of the word \"hurt\"!):

This is the meaning I have translated to the Swabian dialect. However, I just have noticed that most of the German variants translate the \"inflict pain\" meaning. The German example should read:

\"Ich kann Glas essen ohne mir zu schaden.\"

rather than:

\"Ich kann Glas essen, ohne mir weh zu tun.\"

(The comma fell victim to the 1996 orthographic reform, cf. http://www.ids-mannheim.de/reform/e3-1.html#P76.

kjambunathan commented 3 years ago

Btw, I discovered the following side-effect ... so re-opening this bug

https://github.com/kjambunathan/org-mode-ox-odt/commit/286a9b789db842cfa3a2fa68e4383bf6101da10c

This commit for changing language/locale "breaks" the "in buffer" configuration of font size. FWIW, I have shared a recipe for quick configuration of font sizes here https://github.com/kjambunathan/org-mode-ox-odt/discussions/102#discussioncomment-1440557. (Hint: See the initial few lines of diff in that comment)

Here is a sample snippet:

#+odt_extra_styles: <style:style style:name="Standard" style:family="paragraph"
#+odt_extra_styles:          style:class="text">
#+odt_extra_styles:   <style:text-properties fo:font-size="10pt"/>
#+odt_extra_styles: </style:style>

#+language: en_GB

Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages.  They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].

kjambunathan commented 3 years ago

Re-opening this bug ...

kjambunathan commented 3 years ago

Ethan Mollick's I Can Eat Glass page (which disappeared on or about June 2004)

This page is in Wayback Machine ... https://web.archive.org/web/20010308220934/http://www.hcs.harvard.edu/~igp/glass.html.

ouboub commented 3 years ago

Btw, I discovered the following side-effect ... so re-opening this bug

286a9b7

This commit for changing language/locale "breaks" the "in buffer" configuration of font size. FWIW, I have shared a recipe for quick configuration of font sizes here #102 (reply in thread). (Hint: See the initial few lines of diff in that comment)

Here is a sample snippet:
#+odt_extra_styles: <style:style style:name="Standard" style:family="paragraph"
#+odt_extra_styles:        style:class="text">
#+odt_extra_styles:   <style:text-properties fo:font-size="10pt"/>
#+odt_extra_styles: </style:style>

#+language: en_GB

Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages.  They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].

oops, I did not pay attention, but you are right, and for me right now the 10pt configuration is more important, so I run git checkout -b 10pt 774c32566^ then make clean and make and I am back to the correct 10pt configuration, If you find a fix, I would appreciate it

kjambunathan commented 3 years ago

I have added org-odt-experimental-features. By default, LANGUAGE feature is ON. I have kept the option ON so that I can get feedback on the feature.

I have introduced a new style OrgUser which the user can setup. So, if you have any Standard style in #+odt_extra_styles:, rename it to OrgUser. (Even if you don't do this renaming, old org files will continue to work as before)

For example, the snippet below will export to 10pt German document by default (i.e., when EXPERIMENTALlanguage feature is ON) and will export to 10pt English (UK) document otherwise (i.e., when EXPERIMENTAL language feature is OFF)

There shouldn't be any need for you to turn OFF org-odt-experimental-features at all. Before turning OFF the EXPERIMENTAL features, please report any problems you may have with me.

FWIW, I will be adding starmath--a much better alternative to the default latex->mathml converter-- to org-odt-experimental-features very soon. See Add support for embedding starmath fragments · Issue #87 · kjambunathan/org-mode-ox-odt.

#+odt_extra_styles: <style:style style:name="OrgUser" style:family="paragraph"
#+odt_extra_styles:          style:class="text">
#+odt_extra_styles:   <style:text-properties fo:font-size="10pt"/>
#+odt_extra_styles: </style:style>

#+language: de_DE

Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages.  They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].

kjambunathan commented 3 years ago

For example, the snippet below will export to 10pt German document by default (i.e., when EXPERIMENTALlanguage feature is ON)

Screenshot from 2021-10-14 16-40-55

kjambunathan / org-mode-ox-odt

Bug: ODT exporter does not honour +LANGUAGE line [9.1.9 (release_9.1.9-65-g5e4542 @ /usr/share/emacs/26.1/lisp/org/)] #80

Appendix