kyoxiao / pandoc

Automatically exported from code.google.com/p/pandoc
GNU General Public License v2.0
0 stars 0 forks source link

enable language tagging in pandoc's markdown #201

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I discovered pandoc some time ago (congratulations it is really a useful
tool), but I have just used it to convert a LaTeX book into HTML.

I have found some issues that I will report in other issues, but the most
important one is the lack of support for language tags, I mean, to set the
document language and to mark text passages in other language.

This feature is essential in TeX as well as in RTF and ODF (to get right
hyphenation). HTML and XML would be benefited by this feature.

I don't know how the actual marking in pandoc should look like, but it
should support both formats such as "en" "en-US" (and even "grc", since
there is no other way to get it).

Thanks for your excellent work,

Pablo

Original issue reported on code.google.com by google-c...@pragmata.tk on 10 Jan 2010 at 11:34

GoogleCodeExporter commented 8 years ago
It would be helpful if you could provide some examples of how language tagging 
looks 
in LaTeX, and some links to its documentation.

If you just want to set the global language for a document, you can easily do 
that 
through a custom template.  But I take it you're talking about going back and 
forth 
between languages in the same document.

Original comment by fiddloso...@gmail.com on 10 Jan 2010 at 4:19

GoogleCodeExporter commented 8 years ago
Sorry for not having answered before. I'm new to other Google services than the
search engine, but shouldn't I have been supposed to receive an email 
notification
from Google Code?

babel is the package that enables multi-language hyphenation support in LaTeX
(http://mirror.ctan.org/macros/latex/required/babel/babel.pdf). It provides two 
main
commands for language switching: \selectlanguage{languagename} and
\foreignlanguage{languagename}{text in foreign language}. Languages supported by
babel are listed on pages 8-9 of the documentation.

\selectlanguage is used to switch the document language “from here on” and
\foreignlanguage is used to mark foreign text snippets. A minimal sample would 
be:

\documentclass{minimal}
\usepackage[latin,german,english]{babel}
\begin{document}
Law is what the ancient Romans called \foreignlanguage{latin}{lex} and the 
Germans
translated as \foreignlanguage{german}{Gesetz}.
\end{document}

polyglossia, a babel replacement for XeLaTeX, also implements both commands (for
compatibility purposes, such as others), but the list of supported language is
different. polyglossia supports more languages and it renames languages 
supported by
both polyglossia and babel in a different way.

Language tagging in HTML and XML is described at
http://www.w3.org/International/articles/language-tags/#bytheway.

Original comment by google-c...@pragmata.tk on 17 Jan 2010 at 10:18

GoogleCodeExporter commented 8 years ago
Sorry, I forgot the polyglossia documentation:
http://mirror.ctan.org/macros/xetex/latex/polyglossia/polyglossia.pdf.

Original comment by google-c...@pragmata.tk on 19 Jan 2010 at 9:17

GoogleCodeExporter commented 8 years ago
I don't know what happened to this issue.

I have just discovered marks that Textile marks up language in an interesting 
way
(described at http://www.textpattern.com/help/?item=attributes).

Language is marked as [language] and can be applied to phrase elements and to 
block
elements.

I don't know whether this could be added to block elements, but it would be 
easy to
add it to elements, such as _[de]fremdsprachige Elemente_. I don't know whether
pandoc can mark spans in text (required for marking only a foreign language 
expression).

Original comment by google-c...@pragmata.tk on 27 May 2010 at 7:10

GoogleCodeExporter commented 8 years ago
Language taggin should be also interesting for HTML, since there are tools that 
provide hyphenation for HTML (http://code.google.com/p/hyphenator/).

Original comment by google-c...@pragmata.tk on 20 Jun 2010 at 10:03

GoogleCodeExporter commented 8 years ago
Sorry, but this issue was reported more than a year ago.

And I wonder whether you are interested in implementing it.

Original comment by google-c...@pragmata.tk on 2 Apr 2011 at 8:23

GoogleCodeExporter commented 8 years ago
Language tagging can't be implemented without serious changes in pandoc's 
document model.  A general mechanism for attaching attributes to inline and 
block elements in pandoc's markdown would also be needed.  It's possible that 
pandoc will go in this direction at some point, but it's not currently a 
priority, and it might never happen.  (If we added every feature that anyone 
requested, we'd end up recreating LaTeX. Pandoc's goals are different; it is 
simpler than LaTeX and will never be as flexible.)

Original comment by fiddloso...@gmail.com on 3 Apr 2011 at 12:50

GoogleCodeExporter commented 8 years ago
Thanks for your reply, John.

Original comment by google-c...@pragmata.tk on 3 Apr 2011 at 4:27

GoogleCodeExporter commented 8 years ago
Any news on this?  I think this issue still deserves some attention :-)

Original comment by lemzw...@googlemail.com on 7 Oct 2014 at 10:41

GoogleCodeExporter commented 8 years ago
This project has been moved to GitHub.

Some relevant issues to this topic would be:

https://github.com/jgm/pandoc/issues/1614
https://github.com/jgm/pandoc/issues/1667
https://github.com/jgm/pandoc/issues/895

Fell free to join the discusion there.

Original comment by google-c...@pragmata.tk on 18 Oct 2014 at 4:19