threedaymonk / htmlbeautifier

A normaliser/beautifier for HTML that also understands embedded Ruby. Ideal for tidying up Rails templates.
MIT License
339 stars 59 forks source link

`>` in attribute breaks formatting #78

Open andyw8 opened 10 months ago

andyw8 commented 10 months ago
<div foo="a>b"
     bar="1">

is incorrectly formatted to

<div foo="a>b"
  bar="1">

(Without the > it works correctly).

This causes problems for Hotwire/Stimulus since it uses notation such as:

<button data-action="click->hello#greet">Greet</button>
jon-sully commented 2 months ago

So... I ran into this and figured out a solution, specifically for Stimulus, but it's not exactly pretty. Leaving my notes here in case anybody else wants to give it a go.

After digging through the source of this project and iterating over a LOT of regex tries, the secret lies in the

ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx
# change to
ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx

And the

[%r{<#{ELEMENT_CONTENT}[^/]>}om,
      :open_element],
# change to
[%r{<#{ELEMENT_CONTENT}[^/]*?>}om,
      :open_element],

Which basically reconfigures the parser to ignore anything inside a data-action="[anything]" attribute/value.

I tried to monkey-patch this in, but ultimately htmlbeautifier isn't loaded from something like Rails, which initializes monkey patches. There are other ways, but they didn't work well for me.

What did work well for me is... a bit rougher, but overall a self-contained solution.

I created a new bin-script called bin/erb then essentially collapsed this project down into a single executable script (make sure you chmod to make it executable!) with my edits inside.

[!NOTE]

I also prefer having a line break between basically every disparate element in my HTML, so I also tweaked Builder#emit — you'll see the "JonSully override". Feel free to remove that line if you prefer

The Code for `bin/erb` (click to open)

```ruby #!/usr/bin/env ruby # NOTE: Bundles up the gem `htmlbeautifier` into a single executable Ruby script # NOTE: Contains a couple of overrides from the stock script. # NOTE: Set `execute path` in the VS Code plugin to simply `bin/erb` (This file) require "strscan" require "optparse" require "fileutils" require "stringio" class Parser def initialize @maps = [] yield self if block_given? end def map(pattern, method) @maps << [pattern, method] end def scan(subject, receiver) @scanner = StringScanner.new(subject) dispatch(receiver) until @scanner.eos? end def source_so_far @scanner.string[0...@scanner.pos] end def source_line_number [source_so_far.chomp.split(%r{\n}).count, 1].max end private def dispatch(receiver) _, method = @maps.find { |pattern, _| @scanner.scan(pattern) } raise "Unmatched sequence" unless method receiver.__send__(method, *extract_params(@scanner)) rescue => e raise "#{e.message} on line #{source_line_number}" end def extract_params(scanner) return [scanner[0]] unless scanner[1] params = [] i = 1 while scanner[i] params << scanner[i] i += 1 end params end end class HtmlParser < Parser # ELEMENT_CONTENT = %r{ (?:<%.*?%>|[^>])* }mx # stock ELEMENT_CONTENT = %r{ (?:<%.*?%>|data-action\s*=\s*"(?:[^"]*?->[^"]*?)"|[^>])* }mx # JonSully override HTML_VOID_ELEMENTS = %r{(?: area | base | br | col | command | embed | hr | img | input | keygen | link | meta | param | source | track | wbr )}mix HTML_BLOCK_ELEMENTS = %r{(?: address | article | aside | audio | blockquote | canvas | dd | details | dir | div | dl | dt | fieldset | figcaption | figure | footer | form | h1 | h2 | h3 | h4 | h5 | h6 | header | hr | li | menu | noframes | noscript | ol | p | pre | section | table | tbody | td | tfoot | th | thead | tr | ul | video )}mix MAPPINGS = [ [%r{(<%-?=?)(.*?)(-?%>)}om, :embed], [%r{}om, :close_ie_cc], [%r{}om, :standalone_element], [%r{}om, :standalone_element], [%r{()(.*?)()}omi, :foreign_block], [%r{()(.*?)()}omi, :foreign_block], [%r{()(.*?)()}omi, :preformatted_block], [%r{()(.*?)()}omi, :preformatted_block], [%r{<#{HTML_VOID_ELEMENTS}(?: #{ELEMENT_CONTENT})?/?>}om, :standalone_element], [%r{}om, :close_block_element], [%r{<#{HTML_BLOCK_ELEMENTS}(?: #{ELEMENT_CONTENT})?>}om, :open_block_element], [%r{}om, :close_element], # [%r{<#{ELEMENT_CONTENT}[^/]>}om, # stock # :open_element], [%r{<#{ELEMENT_CONTENT}[^/]*?>}om, # JonSully override :open_element], [%r{<[\w\-]+(?: #{ELEMENT_CONTENT})?/>}om, :standalone_element], [%r{(\s*\r?\n\s*)+}om, :new_lines], [%r{[^<\n]+}, :text] ].freeze def initialize super do |p| MAPPINGS.each do |regexp, method| p.map regexp, method end end end end class RubyIndenter INDENT_KEYWORDS = %w[if elsif else unless while until begin for case when].freeze OUTDENT_KEYWORDS = %w[elsif else end when].freeze RUBY_INDENT = %r{ ^ ( #{INDENT_KEYWORDS.join("|")} )\b | \b ( do | \{ ) ( \s* \| [^|]+ \| )? $ }xo RUBY_OUTDENT = %r{ ^ ( #{OUTDENT_KEYWORDS.join("|")} | \} ) \b }xo def outdent?(lines) lines.first =~ RUBY_OUTDENT end def indent?(lines) lines.last =~ RUBY_INDENT end end class Builder DEFAULT_OPTIONS = { indent: " ", initial_level: 0, stop_on_errors: false, keep_blank_lines: 0 }.freeze def initialize(output, options = {}) options = DEFAULT_OPTIONS.merge(options) @tab = options[:indent] @stop_on_errors = options[:stop_on_errors] @level = options[:initial_level] @keep_blank_lines = options[:keep_blank_lines] @new_line = false @empty = true @ie_cc_levels = [] @output = output @embedded_indenter = RubyIndenter.new end private def error(text) return unless @stop_on_errors raise text end def indent @level += 1 end def outdent error "Extraneous closing tag" if @level == 0 @level = [@level - 1, 0].max end def emit(*strings) strings_join = strings.join("") @output << "\n" if @new_line && !@empty @output << (@tab * @level) if @new_line && !strings_join.strip.empty? @output << strings_join # @new_line = false # stock @new_line = true # JonSully override @empty = false end def new_line @new_line = true end def embed(opening, code, closing) lines = code.split(%r{\n}).map(&:strip) outdent if @embedded_indenter.outdent?(lines) emit opening, code, closing indent if @embedded_indenter.indent?(lines) end def foreign_block(opening, code, closing) emit opening emit_reindented_block_content code unless code.strip.empty? emit closing end def emit_reindented_block_content(code) lines = code.strip.split(%r{\n}) indentation = foreign_block_indentation(code) indent new_line lines.each do |line| emit line.rstrip.sub(%r{^#{indentation}}, "") new_line end outdent end def foreign_block_indentation(code) code.split(%r{\n}).find { |ln| !ln.strip.empty? }[%r{^\s+}] end def preformatted_block(opening, content, closing) new_line emit opening, content, closing new_line end def standalone_element(elem) emit elem new_line if elem =~ %r{^ e raise "Error parsing #{name}: #{e}" end executable = File.basename(__FILE__) options = {indent: " "} parser = OptionParser.new do |opts| opts.banner = "Usage: #{executable} [options] [file ...]" opts.separator <<~STRING #{executable} has two modes of operation: 1. If no files are listed, it will read from standard input and write to standard output. 2. If files are listed, it will modify each file in place, overwriting it with the beautified output. The following options are available: STRING opts.on( "-t", "--tab-stops NUMBER", Integer, "Set number of spaces per indent (default #{options[:tab_stops]})" ) do |num| options[:indent] = " " * num end opts.on( "-T", "--tab", "Indent using tabs" ) do options[:indent] = "\t" end opts.on( "-i", "--indent-by NUMBER", Integer, "Indent the output by NUMBER steps (default 0)." ) do |num| options[:initial_level] = num end opts.on( "-e", "--stop-on-errors", "Stop when invalid nesting is encountered in the input" ) do |num| options[:stop_on_errors] = num end opts.on( "-b", "--keep-blank-lines NUMBER", Integer, "Set number of consecutive blank lines" ) do |num| options[:keep_blank_lines] = num end opts.on( "-l", "--lint-only", "Lint only, error on files which would be modified", "This is not available when reading from standard input" ) do |num| options[:lint_only] = num end end parser.parse! if ARGV.any? failures = [] ARGV.each do |path| input = File.read(path) if options[:lint_only] output = StringIO.new beautify path, input, output, options failures << path unless input == output.string else temppath = "#{path}.tmp" File.open(temppath, "w") do |file| beautify path, input, file, options end FileUtils.mv temppath, path end end unless failures.empty? warn [ "Lint failed - files would be modified:", *failures ].join("\n") exit 1 end else beautify "standard input", $stdin.read, $stdout, options end ```

I told you it wasn't exactly pretty! But now if we set the VS Code extension to use a custom "execute path", setting it to simply bin/erb, it'll work.

Plus we can uninstall the htmlbeautifier gem itself since we're running our own stock ruby, not the gem.

andyw8 commented 2 months ago

Thanks for looking into that!

I'm curious though, why didn't you make the changes in a branch and point your Gemfile to that?

jon-sully commented 2 months ago

Yeah I guess that could've worked, I just found the library to be so small that it felt simpler to inline. Maybe I'll swap at some point, but it'll be easier to 'ship' changes in the future for my whole team if it's in git

andyw8 commented 1 month ago

PR: https://github.com/threedaymonk/htmlbeautifier/pull/82