pandoc / lua-filters

A collection of lua filters for pandoc
MIT License
600 stars 165 forks source link

replace NBSP symbols to space #244

Closed trianon1983 closed 2 years ago

trianon1983 commented 2 years ago

I am converting the html of a confluence page to asciidoc fotmat. After conversion, the text contains NBSP characters. Therefore, there is a need to use a filter that would replace these characters with a regular space or empty (if this character in the end). Can anyone help create such a filter. I think it could be useful to many.

input: image output: image

I tried to write it myself, but I don't have experience writing filters in lua and enough time to do it ` function Inlines (inlines) for i = 1, #inlines do local currentEl = inlines[i] strTxt = currentEl.text strType = currentEl.t if currentEl.t == 'Str' then if string.find(strTxt, '\160') ~= nil then print(strTxt) print(currentEl) inlines[i].text = string.gsub(strTxt, '\160', '') print(strTxt) print(inlines[i]) end end end return inlines end

return { {Inlines = Inlines}, } `

  1. when I did a simple replacement of such characters, a new character appeared: '\65533'
  2. and I know that a space is a separate element inlines and I need to somehow insert it into the collection

Someone can help me?

tarleb commented 2 years ago

You could try something like

function Str (s)
  return pandoc.Str(s.text:gsub('\160', ''))
end

Please use the pandoc-discuss mailing list for further questions, or post it on a site like StackOverflow.