Closed goulvench closed 1 year ago
Thank you for your pull request. I see that this could work in certain cases. Could you please post an example here with a real use case?
I am wondering because the preprocessor is called in many different places. The text argument does not always contain the entire text of a tag. In other cases, the tag name from the stack is not the one that contains the text, which may confuse people. I am a bit afraid to introduce code that leads to wrong expectations or is ambiguous.
Hi again, sorry for the delay.
Here is a simplified version of the code I use to generate the document outline:
class Pdf
include Prawn::View
def initialize(name, html)
# The @document variable is required by Prawn::View and cannot be renamed
@document = Prawn::Document.new
@document.markup_options = {
text: { preprocessor: ->(text, tag, attributes) { preprocess(text, tag, attributes) } },
heading1: { style: :bold, size: 24 },
heading2: { style: :bold, size: 20 },
heading3: { style: :italic, size: 20 },
}
@toc = []
@name = name
markup html
add_outline
end
def preprocess(text, tag, _attributes)
add_anchor(text, level: tag.last.to_i - 1) if tag.in? ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']
text
end
def add_anchor(name, level: 1)
anchor = "anchor_#{@toc.flatten.size}"
add_dest anchor, dest_xyz(bounds.absolute_left, y + 20) # Rewind so the anchor points to the title start
if level > 1
@toc.last[:articles] << { anchor: anchor, name: name, page: page_count }
else
@toc << { anchor: anchor, name: name, page: page_count, articles: [] }
end
end
def add_outline
# Generate the clickable outline (usually shown in a sidebar in PDF readers)
outline.section @name, destination: 1
@toc.each do |entry|
outline.section entry[:name], destination: entry[:page] do
entry[:articles].each do |article|
outline.page title: article[:name], destination: article[:page]
end
end
end
end
end
I wonder why the text or tag name could be different from what is expected, is that what happens with <h1>Some <b>bold</b> text</h1>
for instance?
Most start tags (div, p, h1-6, table, ul, ol, img, ...) cause the preceding text to be inserted and thereby preprocessed. The parse stack is already at the started element though. Hence the preprocessor gets the just started element tag instead of the parent element that also contains the text.
Consider the following example:
<div>
here we present
<h1>A Great Heading</h1>
and
<ul>
<li>much</li>
<li>more</li>
</ul>
</div>
The text "here we present" would be preprocessed with the tag h1
, "and" with ul
. I did not check the entire code, probably other constellations exists, too. To change this behaviour, quite a few places would have to be modified. With the current implementation, only special cases and well crafted HTML would make sense. Like this the feature is not very helpful.
I like the idea of the outline though. Probably you find another way to collect the hierarchical headings directly, e.g. in the end_heading
method?
I found this issue, because I tried this and it did not work:
pdf.markup_options = {
heading1: {
size: 11, # Inherited, works
preprocessor: -> { _1.upcase } # Does not work
}
}
Would it not be easier to enable the preprocessor
in this heading1
scope? Just as most other attributes from text
are inherited?
This could be possible for headings, but not for other tags. The preprocessor could be passed to the add_current_text
method. It might even work for the original problem to create a TOC.
If you do have time to work on this, please open a new pull request.
Closing because of inactivity
The proposed change allows the preprocessor callable to do more than just transform the text contents of tags, like adding anchors to build an outline. I'd be happy to add documentation to this PR if you think it's a useful change. The arity check is for backwards compatibility. This would belong in a minor version bump.