puzzle / prawn-markup

Parse simple HTML markup to include in Prawn PDFs
MIT License
65 stars 16 forks source link

Pass tag name and attributes to preprocessor callable #20

Closed goulvench closed 1 year ago

goulvench commented 2 years ago

The proposed change allows the preprocessor callable to do more than just transform the text contents of tags, like adding anchors to build an outline. I'd be happy to add documentation to this PR if you think it's a useful change. The arity check is for backwards compatibility. This would belong in a minor version bump.

codez commented 2 years ago

Thank you for your pull request. I see that this could work in certain cases. Could you please post an example here with a real use case?

I am wondering because the preprocessor is called in many different places. The text argument does not always contain the entire text of a tag. In other cases, the tag name from the stack is not the one that contains the text, which may confuse people. I am a bit afraid to introduce code that leads to wrong expectations or is ambiguous.

goulvench commented 2 years ago

Hi again, sorry for the delay.

Here is a simplified version of the code I use to generate the document outline:

class Pdf
  include Prawn::View

  def initialize(name, html)
    # The @document variable is required by Prawn::View and cannot be renamed
    @document = Prawn::Document.new
    @document.markup_options = {
      text: { preprocessor: ->(text, tag, attributes) { preprocess(text, tag, attributes) } },
      heading1: { style: :bold, size: 24 },
      heading2: { style: :bold, size: 20 },
      heading3: { style: :italic, size: 20 },
    }
    @toc = []
    @name = name
    markup html
    add_outline
  end

  def preprocess(text, tag, _attributes)
    add_anchor(text, level: tag.last.to_i - 1) if tag.in? ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']
    text
  end

  def add_anchor(name, level: 1)
    anchor = "anchor_#{@toc.flatten.size}"
    add_dest anchor, dest_xyz(bounds.absolute_left, y + 20) # Rewind so the anchor points to the title start
    if level > 1
      @toc.last[:articles] << { anchor: anchor, name: name, page: page_count }
    else
      @toc << { anchor: anchor, name: name, page: page_count, articles: [] }
    end
  end

  def add_outline
    # Generate the clickable outline (usually shown in a sidebar in PDF readers)
    outline.section @name, destination: 1
    @toc.each do |entry|
      outline.section entry[:name], destination: entry[:page] do
        entry[:articles].each do |article|
          outline.page title: article[:name], destination: article[:page]
        end
      end
    end
  end
end

I wonder why the text or tag name could be different from what is expected, is that what happens with <h1>Some <b>bold</b> text</h1> for instance?

codez commented 2 years ago

Most start tags (div, p, h1-6, table, ul, ol, img, ...) cause the preceding text to be inserted and thereby preprocessed. The parse stack is already at the started element though. Hence the preprocessor gets the just started element tag instead of the parent element that also contains the text.

Consider the following example:

<div>
here we present
<h1>A Great Heading</h1>
and
<ul>
  <li>much</li>
  <li>more</li>
</ul>
</div>

The text "here we present" would be preprocessed with the tag h1, "and" with ul. I did not check the entire code, probably other constellations exists, too. To change this behaviour, quite a few places would have to be modified. With the current implementation, only special cases and well crafted HTML would make sense. Like this the feature is not very helpful.

I like the idea of the outline though. Probably you find another way to collect the hierarchical headings directly, e.g. in the end_heading method?

halo commented 2 years ago

I found this issue, because I tried this and it did not work:

pdf.markup_options = {
  heading1: {
    size: 11, # Inherited, works
    preprocessor: -> { _1.upcase } # Does not work
  }
}

Would it not be easier to enable the preprocessor in this heading1 scope? Just as most other attributes from text are inherited?

codez commented 2 years ago

This could be possible for headings, but not for other tags. The preprocessor could be passed to the add_current_text method. It might even work for the original problem to create a TOC.

If you do have time to work on this, please open a new pull request.

codez commented 1 year ago

Closing because of inactivity