Open svenboulanger opened 1 year ago
For those interested, right now I'm working around it by using the following lua filter as the last filter (part was used from this issue). It simply replaces all headers with their XML.
-- This filter will replace all headers by their OOXML as it otherwise interferes with referencing
local counter = 0
local current_index = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
local zip = require 'pandoc.zip'
function inlines_to_ooxml(inlines)
local docx = pandoc.write(pandoc.Pandoc(pandoc.Para(inlines)), 'docx')
local document_entry = zip.Archive(docx).entries:find_if(
function (entry) return entry.path == 'word/document.xml' end
)
-- Extract the paragraph contents
text = document_entry:contents():gsub('.*<w:p>(.*)</w:p>.*', '%1')
-- Also remove paragraph styling
text = text:gsub('<w:pPr>.*</w:pPr>', '')
return text
end
function Header(header)
is_numbered = true
index = nil
for _, value in ipairs(header.attr.classes) do
if value == 'unnumbered' then is_numbered = false end
end
xml = { '<w:p>' }
-- Styling
table.insert(xml, '<w:pPr>')
table.insert(xml, '<w:pStyle w:val="Heading' .. header.level .. '" />')
if not is_numbered then
table.insert(xml, '<w:numPr><w:ilvl w:val="0"/><w:numId w:val="0"/></w:numPr>')
end
table.insert(xml, '</w:pPr>')
-- Add section numbering if applicable
if PANDOC_WRITER_OPTIONS.number_sections and is_numbered then
-- Increment the header index and add it
current_index[header.level] = current_index[header.level] + 1
for i = header.level+1,#current_index do current_index[i] = 0 end
index = ''
for i = 1,header.level do
if i > 1 then index = index .. '.' end
index = index .. current_index[i]
end
table.insert(xml, '<w:r><w:rPr><w:rStyle w:val="SectionNumber" /></w:rPr><w:t xml:space="preserve">' .. index .. '</w:t></w:r><w:r><w:tab /></w:r>')
end
-- Start of bookmarks
if header.attr.identifier ~= nil then
counter = counter + 1
table.insert(xml, '<w:bookmarkStart w:id="h' .. counter .. '" w:name="' .. header.attr.identifier .. '" />')
end
-- Header contents
table.insert(xml, inlines_to_ooxml(header.content))
-- End of bookmarks
if header.attr.identifier ~= nil then
table.insert(xml, '<w:bookmarkEnd w:id="h' .. counter .. '" />')
end
table.insert(xml, '</w:p>')
return pandoc.RawBlock('openxml', table.concat(xml, ''))
end
So if I understand correctly, then the problem is that the docx writer treats heading IDs as identifiers for the whole section? It seems sensible to change that, but I'm not sure if there could be unintended consequences.
That is correct.
From what I can find, bookmarks can be placed anywhere in the document (source follows the convention Word uses).
It doesn't strictly violate the format of an OpenXML document (it is not a syntax error), but it doesn't play nice when combined with the OOXML referencing. The main problem I'm having I think is described here:
If the text marked by the bookmark contains a paragraph mark, the text preceding the REF field assumes the formatting of the paragraph in the bookmark.
The bookmark always contains a (header) paragraph in the way that pandoc has implemented the writer. Even if I create a correct REF field instruction pointing to a pandoc-generated header identifier, then updating the fields in Word itself results in weird things (i.e. it is not compatible).
So is the desired output something like this?
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1" />
</w:pPr>
<w:bookmarkStart w:id="20" w:name="header-name" />
<w:r>
<w:t xml:space="preserve">Header Name</w:t>
</w:r>
<w:bookmarkEnd w:id="20" />
</w:p>
<!-- Other paragraphs and stuff -->
So is the desired output something like this?
<w:p> <w:pPr> <w:pStyle w:val="Heading1" /> </w:pPr> <w:bookmarkStart w:id="20" w:name="header-name" /> <w:r> <w:t xml:space="preserve">Header Name</w:t> </w:r> <w:bookmarkEnd w:id="20" /> </w:p> <!-- Other paragraphs and stuff -->
That is correct. You might also want to take a look at this issue since I guess it targets the same code.
I'm not sure if this needed to be an enhancement instead, but here goes. I found out that Pandoc exports this XML for headers:
The problem is the
<w:bookmarkStart>
and<w:bookmarkEnd>
are outside the header paragraph. When adding a reference to it, i.e. this XML:When updating these reference fields in Word, it causes the entire contents of the bookmark tags to be copied instead of just the header text and this messes up the document. The cause is of course that the bookmark tags appear outside of the header tag in the XML. I have also noticed other issues, such as inserting a header before a labeled header messes up references. The reason for the latter is that the new header gets spliced in right after the
<w:bookmarkStart>
tag, making the spliced-in header now the target of all references.Note that it probably only matters to people that want to add cross references from pandoc. If you make a crossreference from within Word, it will automatically create a second pair of bookmark tags that are in fact inside the header paragraph.
I propose to instead generate the following XML, which is closer to what Word exports by itself: