Open rauschma opened 4 years ago
Are you talking about LaTeX indices, or...? I'm not very familiar with those.. What exactly is the feature request? A change to pandoc's latex output, or...?
Are you talking about LaTeX indices, or...?
Indices are are crucial feature of print books (and are very helpful for digital books, too). This is an example in HTML (produced via my Lua filter): https://exploringjs.com/impatient-js/ch_index.html
What exactly is the feature request? A change to pandoc's latex output, or...?
Include the functionality in Pandoc that I have implemented via my filter:
One needs to support two “commands”:
\index{myterm}
[myterm]{.index}
\printindex
## Index {.append-index}
Thanks for the explanations. Seems somewhat related to https://github.com/jgm/pandoc/issues/813... ?
Very loosely. So far, I have always put index terms into the top level of a section, never inside tables, figures, or headers.
A few more details – the idea of creating an index is as follows:
\index{SomeTopic}
next to it. You can think of it as a link target. It being next to the topic means that if the content ever moves, so does the link target.Step 2 is crucial and shouldn’t have to be done manually (=error-prone tedious work).
An index is similar to a table of contents in that it also provides quick access to content (but via topic, not via heading). This is especially important for print books where you don’t have full text search. Most non-fiction print books have indices. If they don’t, people complain on Amazon. 😀
@rauschma now when Pandoc's Lua API includes lpeg/re it would make sense to use an re pattern to parse \index{...}
strings. An alternative might be to (ab)use Pandoc's citation syntax, something like @idx:SORTKEY[ACTUAL-TEXT]
to allow (Pandoc) formatting in the actual-text although I fully understand that you probably don't want to change your work flow at this point. Also it would be great if "special" characters could be included in sort keys[^1]; I once wrote an implementation of Sort::ArbBiLex in MoonScript/Lua which "converts" sort keys to a string of hex numbers separated by dots to make it possible to use arbitrary sort orders with Lua's table.sort
function. You can even sort non-Latin alphabets with it, although CJK not so much...
[^1]: Because e.g. in Swedish å, ä, ö sort at the end of the alphabet, unlike German where you can treat ä, ö ü as a, o, u or ae, oe, ue. In the Swedish case you can (and I do) cheat by using ~a ~e ~o
, but such hacks are not possible for all languages/sort orders.
It would really be great if Pandoc supported indices, for all of the reasons outlined above. In pretty much any non-fiction work of non-trivial length where readers might want to look up a topic, it is useful to have an index.
It's a can of worms though, since different languages have different sorting rules. Hopefully there is a Haskell library similar to Unicode::Collate/Unicode::Collate::Locale or Sort::ArbBiLex. I have ported the latter to MoonScript/Lua myself but I'd be loth to ask for either to be ported to Haskell as I'm unable to do it myself.
Also how would the index work? In a PDF/ebook you would want to refer to page numbers and link to the pages. In (a) webpage(s) you would want to link to locations which might be in another file, and in that case what should the link text look like? Moreover with HTML output you would want the index to look different depending on whether you output a single web page, multiple web pages, PDF or ebook. In some cases you would probably want to reference sections or paragraphs, which probably would mean that you would want to have section/paragraph numbers already sorted out.[^parnum]
[^parnum]: Which in turn probably means that you want section/paragraph numbers to be wrapped in spans with a class to make it easier to pick them up.
That's a lot of configuration and at the end of the day you might be better off using some external tool to build the index, in conjunction with either a filter or something builtin which produces the input to the external tool, a bit like makeindex works even if you wouldn't use makeindex itself due to its limitations.
It's certainly got plenty of cases that need to be considered, but it's not really a can of worms. Many of the problems to be solved are pretty orthogonal to each other. It seems like the key things that are needed are:
Each of these chunks is distinct and their interfaces are pretty easy to define, so we should be able to take any can of worms, separate the worms, straighten them out and line them up neatly!
Hopefully there is a Haskell library similar to [Unicode::Collate]
There is my unicode-collation, which we use for proper sorting in citeproc.
Maybe it got sense implementing this with three new Pandoc options? something like:
--abbreviation-index
: include an automatically generated abbreviations index in the output document. This index would include abbreviations created with the existing markup for abbreviations (https://pandoc.org/MANUAL.html#extension-abbreviations), the ones in a custom abbreviation file specified with --abbreviations=FILE
and those created with the markup for abbreviations I'm suggesting in https://github.com/jgm/pandoc/issues/9227.
--definition-index
: include an automatically generated definitions index in the output document. This index would include abbreviations created with the existing markup for definition lists (https://pandoc.org/MANUAL.html#definition-lists) and those created with the markup for definitions I'm suggesting in https://github.com/jgm/pandoc/issues/9227.
--full-index
: include an automatically generated definitions and abbreviations index in the output document.
glossaries LaTeX package: https://ctan.org/pkg/glossaries
makeidx LaTeX package: https://ctan.org/pkg/makeidx
makeindex LaTeX package: https://ctan.org/pkg/makeindex
mkindex
, you can’t use Pandoc anymore and have to manage LaTeX yourself (e.g. vialatexmk
).