the acronyms filter also replaces <img alt="">

rubensworks / ScholarMarkdown

A framework for writing markdown-based scholarly articles.

MIT License

41 stars 9 forks source link

the acronyms filter also replaces <img alt=""> #12

Open bjdmeest opened 6 years ago

bjdmeest commented 6 years ago

If I have, e.g.

UI,User Interface

in acronyms.csv, and somewhere add

<img src="img/ui.jpg" alt="The UI">

this results in very nasty HTML

<img src="img/ui.jpg" alt="The <span class='abbreviation' title='User Interface'>UI</span>&#8221; />

Are you saying you're not parsing the HTML, but doing regular expressions?

bjdmeest commented 6 years ago

Suggestion: https://stackoverflow.com/questions/7234292/modifying-text-inside-html-nodes-nokogiri

rubensworks commented 6 years ago

of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

Yep, this is what happened, my bad...

Until this is fixed, you could lowercase your acronyms that shouldn't be escaped (that's what I do).

bjdmeest commented 6 years ago

give me some minutes, might do a pull request ;)

bjdmeest commented 6 years ago

:( Got close but not quite there yet (also incredibly inefficient)

require 'csv'

class ScholarAcronymFilter < Nanoc::Filter
  requires 'nokogiri'

  identifier :scholar_acronym
  type :text

  def run(content, params = {})
    doc = Nokogiri::HTML(content)
    acronyms = CSV.parse(params[:acronyms].raw_content, :headers => true)

    doc.traverse do |x|
      if x.text?
        acronyms.each do |row|
          x.inner_html = x.content.gsub %r{(?<=[^a-zA-Z0-9])#{row['abbreviation']}(?=[^a-zA-Z0-9])} do |match|
            %{<span class="abbreviation" title="#{row['full']}">#{row['abbreviation']}</span>}
          end
        end
      end
    end

    doc.at('body').children.to_html
  end
end

I'm not ruby-savvy enough to fix this quickly, will manage for now, and putting this code here for if I find some more time ;)