Open Hi-Angel opened 9 years ago
Pandoc has no way of determining what page a particular bit of source text will end up in the latex file, so this is out of scope.
+++ Hi-Angel [Jul 29 15 21:54 ]:
A minimal example to reproduce: \documentclass{article}
\begin{document}
\section{First Section}
Reference to a \pageref{secondSec}
\section{Second Section} \label{secondSec}
\end{document}
It is a single page document, but anyway, just after the Reference to a words should go the page number of the secondSec label, which is 1. But upon converted, there's nothing at all.
The converting command is ~/Projects/pandoc/.cabal-sandbox/bin/pandoc -f latex -t odt -o output.odt test.tex.
Pandoc version 1.15.0.6
— Reply to this email directly or [1]view it on GitHub.
References
But the odt, to which I tried to convert in the example, have the way. I just even tested it — odt can insert a page number of a particular piece of text.
I found an odt specification, I could try to find the particular representation way in the odt if that help
Okay, I found it. At first I wandered in specification, but lately I figured out that the «.odt» is a plain zip, so I can look at the difference. Dat file utility — I was sure it would tell me if «.odt» an archive, but instead it told that it is OpenDocument Text
, so I was confused. Okay, so
The displayed page number of a referenced element is <text:reference-ref text:reference-format="page" text:ref-name="testReference">
— I guess the first is the name, next goes format, and the last is the name of the reference whose page number we want to display.
The referenced element looks like
<text:reference-mark-start text:name="testReference"/>
ReferenceMe<text:reference-mark-end text:name="testReference"/>
Btw, for I looked in specification, I found there's a point reference, it looks like text:reference-mark
.
Out of scope. There's no place in the pandoc document model for pages or references to pages. In general, since pandoc passes everything through a common intermediate representation, you're prone to lose things when you go from one more expressive format to another, even if both formats are expressive enough.
I know this is closed, I debated whether I should reply here or to pandoc-discuss, but I decided to keep a short reply in its context.
Although pages are a foreign concept to markdown, the majority of other to/from formats pandoc supports do have the idea of pages or slides (which I see as analogous), and many have their own method of referencing them.
I've got no idea how or if it would/could be implemented in pandoc markdown (although it would be very nice), I don't see a reason that it shouldn't be part of the underlying system for when both the input and output formats support it. The individual writers could then deal/discard/convert (e.g. to regular link) as appropriate.
I hope my opinion isn't out of place - I'm never sure when it comes to closed issues.
@blindmelon it is obvious, but thanks to BDFL it won't ever be implemented. Pandoc should be forked.
But I had a better idea — recently I noticed in LibreOffice import from PDF. It is in «LO Writer → File → Open», explicitly choose «File type» as «…*.pdf». Though the quality is just awful, but it brought me another idea: I think it would be possible to add alike one-way support for LaTeX there.
I sent feature-request to LO about it, and they refused, but I think I poorly explained the idea, because back then I didn't think about just «import» support, i.e. the request was about a support in general, which would be obviously hard because LaTeX not very matches with WYSIWYG editors.
So, ATM I think to try adding such a support to LO after I end up with my diploma. It is a better idea than any kind of converter, because office suits evolving specifically in office formats support. And either way, even if LO team would disagree with that (I don't think they're but just in case), there're other FOSS office suits, such as OpenOffice, and Calligra Office.
+++ blindmelon [Nov 19 15 23:50 ]:
I know this is closed, I debated whether I should reply here or to pandoc-discuss, but I decided to keep a short reply in its context.
Although pages are a foreign concept to markdown, the majority of other to/from formats pandoc supports do have the idea of pages or slides (which I see as analogous), and many have their own method of referencing them.
I've got no idea how or if it would/could be implemented in pandoc markdown (although it would be very nice), I don't see a reason that it shouldn't be part of the underlying system for when both the input and output formats support it. The individual writers could then deal/discard/convert (e.g. to regular link) as appropriate.
Although in principle we could add PageAnchor and PageRef elements to the Pandoc data structure, this would be quite a lot of work to implement. Any change to the Pandoc structure would require changes in most of Pandoc's 80+ modules, and also in several other libraries that use the Pandoc type (texmath, pandoc-citeproc, etc.). One would have to research ways of doing page anchors and page references in every format Pandoc supports, and figure out reasonable fallback behaviors for the others. This is a large change that would also be a breaking change for libraries that use Pandoc. Not to be undertaken lightly.
In this case, I don't think the benefits outweigh the costs, particularly because the notion of page doesn't make sense in many of the formats we support. There are many more important priorities that would have a higher benefit/cost ratio.
But I don't mind reopening this as a low-priority reminder. Maybe someone would want to take up this large project.
Thanks for the detailed response - I hadn't realised that changing the underlying data structure had such massive consequences! I am now more grateful than ever for the constant improvements.
I appreciate you reopening the issue too. I agree, this isn't the highest priority. Although, I think as more and more people switch to pandoc from tex, it is going to cause some people trouble (I know I find pagerefs in tex useful).
Pandoc has got me interested in learning Haskell when I've got time¹, so maybe I'll look at this myself at some point.
¹ Why oh why didn't you write this in something I already know? ;)
Stumbled across #813 last night and in my sleep made the realisation that internal changes required there probably lay the groundwork for this too. Syntax and fallback for formats that don't have a concept of pages would still be an issue needing addressing.
For syntax, perhaps overloading the proposed referencing syntax in the issue mentioned by appending :page
when a pageref is desired?
At the moment I can't think of a solution to the pageless output formats that don't have hardcoded language (e.g. outputting "on page 3" if output format has pages, else a section reference or "above"/"below").
A minimal example to reproduce:
It is a single page document, but anyway, just after the
Reference to a
words should go the page number of thesecondSec
label, which is1
. But upon converted, there's nothing at all.The converting command is
pandoc -f latex -t odt -o output.odt test.tex
.Pandoc version 1.15.0.6