jgm / pandoc

Universal markup converter
https://pandoc.org
Other
33.5k stars 3.32k forks source link

\pageref in LaTeX doesn't work #2336

Open Hi-Angel opened 9 years ago

Hi-Angel commented 9 years ago

A minimal example to reproduce:

\documentclass{article}

\begin{document}

\section{First Section}

Reference to a \pageref{secondSec}

\section{Second Section}
\label{secondSec}

\end{document}

It is a single page document, but anyway, just after the Reference to a words should go the page number of the secondSec label, which is 1. But upon converted, there's nothing at all.

The converting command is pandoc -f latex -t odt -o output.odt test.tex.

Pandoc version 1.15.0.6

jgm commented 9 years ago

Pandoc has no way of determining what page a particular bit of source text will end up in the latex file, so this is out of scope.

+++ Hi-Angel [Jul 29 15 21:54 ]:

A minimal example to reproduce: \documentclass{article}

\begin{document}

\section{First Section}

Reference to a \pageref{secondSec}

\section{Second Section} \label{secondSec}

\end{document}

It is a single page document, but anyway, just after the Reference to a words should go the page number of the secondSec label, which is 1. But upon converted, there's nothing at all.

The converting command is ~/Projects/pandoc/.cabal-sandbox/bin/pandoc -f latex -t odt -o output.odt test.tex.

Pandoc version 1.15.0.6

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/jgm/pandoc/issues/2336
Hi-Angel commented 9 years ago

But the odt, to which I tried to convert in the example, have the way. I just even tested it — odt can insert a page number of a particular piece of text.

Hi-Angel commented 9 years ago

I found an odt specification, I could try to find the particular representation way in the odt if that help

Hi-Angel commented 9 years ago

Okay, I found it. At first I wandered in specification, but lately I figured out that the «.odt» is a plain zip, so I can look at the difference. Dat file utility — I was sure it would tell me if «.odt» an archive, but instead it told that it is OpenDocument Text, so I was confused. Okay, so

The displayed page number of a referenced element is <text:reference-ref text:reference-format="page" text:ref-name="testReference"> — I guess the first is the name, next goes format, and the last is the name of the reference whose page number we want to display.

The referenced element looks like

<text:reference-mark-start text:name="testReference"/>
ReferenceMe<text:reference-mark-end text:name="testReference"/>

Btw, for I looked in specification, I found there's a point reference, it looks like text:reference-mark.

jgm commented 8 years ago

Out of scope. There's no place in the pandoc document model for pages or references to pages. In general, since pandoc passes everything through a common intermediate representation, you're prone to lose things when you go from one more expressive format to another, even if both formats are expressive enough.

ghost commented 8 years ago

I know this is closed, I debated whether I should reply here or to pandoc-discuss, but I decided to keep a short reply in its context.

Although pages are a foreign concept to markdown, the majority of other to/from formats pandoc supports do have the idea of pages or slides (which I see as analogous), and many have their own method of referencing them.

I've got no idea how or if it would/could be implemented in pandoc markdown (although it would be very nice), I don't see a reason that it shouldn't be part of the underlying system for when both the input and output formats support it. The individual writers could then deal/discard/convert (e.g. to regular link) as appropriate.

I hope my opinion isn't out of place - I'm never sure when it comes to closed issues.

Hi-Angel commented 8 years ago

@blindmelon it is obvious, but thanks to BDFL it won't ever be implemented. Pandoc should be forked.

But I had a better idea — recently I noticed in LibreOffice import from PDF. It is in «LO Writer → File → Open», explicitly choose «File type» as «…*.pdf». Though the quality is just awful, but it brought me another idea: I think it would be possible to add alike one-way support for LaTeX there.

I sent feature-request to LO about it, and they refused, but I think I poorly explained the idea, because back then I didn't think about just «import» support, i.e. the request was about a support in general, which would be obviously hard because LaTeX not very matches with WYSIWYG editors.

So, ATM I think to try adding such a support to LO after I end up with my diploma. It is a better idea than any kind of converter, because office suits evolving specifically in office formats support. And either way, even if LO team would disagree with that (I don't think they're but just in case), there're other FOSS office suits, such as OpenOffice, and Calligra Office.

jgm commented 8 years ago

+++ blindmelon [Nov 19 15 23:50 ]:

I know this is closed, I debated whether I should reply here or to pandoc-discuss, but I decided to keep a short reply in its context.

Although pages are a foreign concept to markdown, the majority of other to/from formats pandoc supports do have the idea of pages or slides (which I see as analogous), and many have their own method of referencing them.

I've got no idea how or if it would/could be implemented in pandoc markdown (although it would be very nice), I don't see a reason that it shouldn't be part of the underlying system for when both the input and output formats support it. The individual writers could then deal/discard/convert (e.g. to regular link) as appropriate.

Although in principle we could add PageAnchor and PageRef elements to the Pandoc data structure, this would be quite a lot of work to implement. Any change to the Pandoc structure would require changes in most of Pandoc's 80+ modules, and also in several other libraries that use the Pandoc type (texmath, pandoc-citeproc, etc.). One would have to research ways of doing page anchors and page references in every format Pandoc supports, and figure out reasonable fallback behaviors for the others. This is a large change that would also be a breaking change for libraries that use Pandoc. Not to be undertaken lightly.

In this case, I don't think the benefits outweigh the costs, particularly because the notion of page doesn't make sense in many of the formats we support. There are many more important priorities that would have a higher benefit/cost ratio.

But I don't mind reopening this as a low-priority reminder. Maybe someone would want to take up this large project.

ghost commented 8 years ago

Thanks for the detailed response - I hadn't realised that changing the underlying data structure had such massive consequences! I am now more grateful than ever for the constant improvements.

I appreciate you reopening the issue too. I agree, this isn't the highest priority. Although, I think as more and more people switch to pandoc from tex, it is going to cause some people trouble (I know I find pagerefs in tex useful).

Pandoc has got me interested in learning Haskell when I've got time¹, so maybe I'll look at this myself at some point.

¹ Why oh why didn't you write this in something I already know? ;)

ghost commented 8 years ago

Stumbled across #813 last night and in my sleep made the realisation that internal changes required there probably lay the groundwork for this too. Syntax and fallback for formats that don't have a concept of pages would still be an issue needing addressing.

For syntax, perhaps overloading the proposed referencing syntax in the issue mentioned by appending :page when a pageref is desired?

At the moment I can't think of a solution to the pageless output formats that don't have hardcoded language (e.g. outputting "on page 3" if output format has pages, else a section reference or "above"/"below").