Closed kjambunathan closed 3 years ago
the ODT exporter always exports as English, even if a document is not written in English and contains a corresponding +LANGUAGE line. Take for example this German document:
#+AUTHOR: Ich #+TITLE: Test #+LANGUAGE: de Dies ist ein Test.
Hello. @ouboub, @QiangF,
I will be adding support for honouring #+LANGUAGE:
line in a few hours (or days). I have a working protoype. I need to cleanup the stuff a bit.
Between us we wil be able to test drive European, Chinese and Indian languages. IOW, we can test latin, indic and asian languages.
very good, thanks! I could test Hebrew as well, if you are willing to implement that as well.
Support for LANGUAGE keyword is now available as part of ox-odt-9.3.7.328.tar
.
Here is a preview of the newly added features:
Support for completion on #+LANGUAGE:
lines.
In the screenshot below I am in the middle of filling ger
, presumaly german language or locale.
Body Text in LibreOffice gets the right LANGUAGE
so that spelll checking etc work just fine
You can see the language of the text in LibreOffice's modeline.
Also don't forget to read the inline comments in org
snippet below
# -*- coding: utf-8; -*-
Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages. They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].
# When on ~#+LANGUAGE: ~ line, press TAB for inputting the
# locale/language.
# Org's ~org-export-dictionary~ prefers just the language code (devoid
# of any country code). So, in the ~*Completions*~ buffer choose
# language, and if that is insufficient to get the translation you want
# choose language /and/ country.
# That said, some remarks ...
# In case of Chinese and Brazilian Portuguese, the Org's dictionary
# uses both the language /and/ country codes. The dictionary adopts
# /no/ specific policy on how to separate language and country codes.
# For example, the Chinese entries use ~hyphen~ as separator, but the
# Brazilian Portuguese entry uses ~underscore~ as separator. The
# convention used in Debian, specifically the
# ~/usr/share/i18n/SUPPORTED~ file (bundled with the ~locales~
# package), is to use ~underscore~ as separator between language and
# country code. Also, in case of Norwegian, Org's dictionary has 3
# entries ~nn~, ~no~ and ~nb~. But, I see ~nb_NO~ and ~nn_NO~ but
# /no/ ~no_NO~ in ~/usr/share/i18n/SUPPORTED~. Note that ~no~ does
# appear as Norwegian in ~iso_639-2~ and ~iso-639-3~ list.
# May be it is time that Org maintainers settle on a consistent syntax
# for the entries in ~org-export-dictionary~.
* Arabic
#+language: ar_OM
أنا قادر على أكل الزجاج و هذا لا يؤلمني.
* COMMENT Chinese (simplified)
#+language: zh
我能吞下玻璃而不伤身体。
* COMMENT English
# #+language: en_IN
I can eat glass and it doesn't hurt me.
* COMMENT Hebrew
#+language: he
אני יכול לאכול זכוכית וזה לא מזיק לי
* COMMENT Hindi
#+language: hi
मैं काँच खा सकता हूँ और मुझे उससे कोई चोट नहीं पहुंचती
* COMMENT Japanese
#+language: ja
私はガラスを食べられます。それは私を傷つけません。
* COMMENT Korean
#+language: ko
나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
* COMMENT German
#+language: de_CH
Ich kann Glas essen, ohne mir zu schaden.
* COMMENT Spanish
#+language: es_AR
Puedo comer vidrio, no me hace daño.
* COMMENT Tamil
#+language: ta_IN
நான் கண்ணாடி சாப்பிடுவேன், அதனால் எனக்கு ஒரு கேடும் வராது.
Probably one of you , at your own discretion (and if you consider it worth your time) can take up my observations about LANGUAGE
line to emacs-orgmode
maintainers.
Org's org-export-dictionary
prefers just the language code (devoid of
any country code). So, in the *Completions*
buffer choose language,
and if that is insufficient to get the translation you want choose
language and country.
That said, some remarks ...
In case of Chinese and Brazilian Portuguese, the Org\'s dictionary uses
both the language and country codes. The dictionary adopts no
specific policy on how to separate language and country codes. For
example, the Chinese entries use hyphen
as separator, but the
Brazilian Portuguese entry uses underscore
as separator. The
convention used in Debian, specifically the /usr/share/i18n/SUPPORTED
file (bundled with the locales
package), is to use underscore
as
separator between language and country code. Also, in case of Norwegian,
Org\'s dictionary has 3 entries nn
, no
and nb
. But, I see nb_NO
and nn_NO
but no no_NO
entry in /usr/share/i18n/SUPPORTED
. Note that
no
does appear as Norwegian in iso_639-2
and iso-639-3
list.
May be it is time that Org maintainers settle on a consistent syntax for
the entries in org-export-dictionary
.
Comment at and around https://github.com/kjambunathan/org-mode-ox-odt/blob/286a9b789db842cfa3a2fa68e4383bf6101da10c/lisp/ox-odt.el#L714 is a good starting point for further exploration.
Hi
thanks excellent I can confirm german, spanish, hebrew, english_US but I fail with british, shouldn't that be english_GB? I will later read your comments and try to act upon them. BTW the sentence you picked up locks strange, but so be it.
I can confirm german, spanish, hebrew, english_US but I fail with british, shouldn't that be english_GB?
Thanks.
If you choose English (United Kingdom)
in completion table you get en-GB
.
LibreOffice's mode-line also prefers English (UK)
to using English (GB)
.
BTW the sentence you picked up locks strange, but so be it.
From https://kermitproject.org/utf8.html#notes
The \"I can eat glass\" phrase and initial translations (about 30 of them) were borrowed from Ethan Mollick\'s I Can Eat Glass page (which disappeared on or about June 2004) and converted to UTF-8. Since Ethan\'s original page is gone, I should mention that his purpose was to offer travelers a phrase they could use in any country that would command a certain kind of respect, or at least get attention. See Credits for the many additional contributions since then. When submitting new entries, the word \"hurt\" (if you have a choice) is used in the sense of \"cause harm\", \"do damage\", or \"bother\", rather than \"inflict pain\" or \"make sad\". In this vein Otto Stolz comments (as do others further down; personally I think it\'s better for the purpose of this page to have extra entries and/or to show a greater repertoire of characters than it is to enforce a strict interpretation of the word \"hurt\"!):
This is the meaning I have translated to the Swabian dialect. However, I just have noticed that most of the German variants translate the \"inflict pain\" meaning. The German example should read:
\"Ich kann Glas essen ohne mir zu schaden.\"
rather than:
\"Ich kann Glas essen, ohne mir weh zu tun.\"
(The comma fell victim to the 1996 orthographic reform, cf. http://www.ids-mannheim.de/reform/e3-1.html#P76.
Btw, I discovered the following side-effect ... so re-opening this bug
https://github.com/kjambunathan/org-mode-ox-odt/commit/286a9b789db842cfa3a2fa68e4383bf6101da10c
This commit for changing language/locale "breaks" the "in buffer" configuration of font size. FWIW, I have shared a recipe for quick configuration of font sizes here https://github.com/kjambunathan/org-mode-ox-odt/discussions/102#discussioncomment-1440557. (Hint: See the initial few lines of diff
in that comment)
Here is a sample snippet:
#+odt_extra_styles: <style:style style:name="Standard" style:family="paragraph"
#+odt_extra_styles: style:class="text">
#+odt_extra_styles: <style:text-properties fo:font-size="10pt"/>
#+odt_extra_styles: </style:style>
#+language: en_GB
Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages. They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].
Re-opening this bug ...
Ethan Mollick's I Can Eat Glass page (which disappeared on or about June 2004)
This page is in Wayback Machine ... https://web.archive.org/web/20010308220934/http://www.hcs.harvard.edu/~igp/glass.html.
Btw, I discovered the following side-effect ... so re-opening this bug
This commit for changing language/locale "breaks" the "in buffer" configuration of font size. FWIW, I have shared a recipe for quick configuration of font sizes here #102 (reply in thread). (Hint: See the initial few lines of
diff
in that comment)Here is a sample snippet:
#+odt_extra_styles: <style:style style:name="Standard" style:family="paragraph" #+odt_extra_styles: style:class="text"> #+odt_extra_styles: <style:text-properties fo:font-size="10pt"/> #+odt_extra_styles: </style:style> #+language: en_GB Phrases in this document are translations of "I can eat glass and it doesn't hurt me" in to various languages. They are taken from [[https://kermitproject.org/utf8.html][UTF-8 Sampler]].
oops, I did not pay attention, but you are right, and for me right now the 10pt configuration is more important, so I run git checkout -b 10pt 774c32566^ then make clean and make and I am back to the correct 10pt configuration, If you find a fix, I would appreciate it
I have added org-odt-experimental-features
. By default, LANGUAGE feature is ON. I have kept the option ON so that I can get feedback on the feature.
I have introduced a new style OrgUser
which the user can setup. So, if you have any Standard
style in #+odt_extra_styles:
, rename it to OrgUser
. (Even if you don't do this renaming, old org
files will continue to work as before)
For example, the snippet below will export to 10pt German document by default (i.e., when EXPERIMENTALlanguage
feature is ON) and will export to 10pt English (UK) document otherwise (i.e., when EXPERIMENTAL language
feature is OFF)
There shouldn't be any need for you to turn OFF org-odt-experimental-features
at all. Before turning OFF the EXPERIMENTAL features, please report any problems you may have with me.
FWIW, I will be adding starmath
--a much better alternative to the default latex->mathml converter-- to org-odt-experimental-features
very soon. See Add support for embedding starmath fragments · Issue #87 · kjambunathan/org-mode-ox-odt.
#+odt_extra_styles: <style:style style:name="OrgUser" style:family="paragraph"
#+odt_extra_styles: style:class="text">
#+odt_extra_styles: <style:text-properties fo:font-size="10pt"/>
#+odt_extra_styles: </style:style>
#+language: de_DE
Phrases in this document are translations of "I can eat glass and it
doesn't hurt me" in to various languages. They are taken from [[https://kermitproject.org/utf8.html][UTF-8
Sampler]].
For example, the snippet below will export to 10pt German document by default (i.e., when EXPERIMENTAL
language
feature is ON)
See https://lists.gnu.org/archive/html/emacs-orgmode/2020-08/msg00140.html
Dear maintainer,
the ODT exporter always exports as English, even if a document is not written in English and contains a corresponding +LANGUAGE line. Take for example this German document:
When it is exported as ODT, the main language is set to English. As a result, the office application's spell checker (LibreOffice's one in my case) highlights everything as wrong that isn't accidentally a valid word in English. To fix that one has to manually change the document's main language to German (in this case) via the office application's menus. This is annoying, especially if the intention is to automatically further convert to DOCX and send the document out unmodified. The HTML exporter on the other hand handles the +LANGUAGE line just fine.
-quintus
Emacs : GNU Emacs 26.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.5) of 2019-09-23, modified by Debian Package: Org mode version 9.1.9 (release_9.1.9-65-g5e4542 @ /usr/share/emacs/26.1/lisp/org/)