schacon / showoff

moved to puppetlabs/showoff!
2.41k stars 13 forks source link

Parse rendered slide content as HTML instead of XML, to preserve HTML entities... #195

Open MichaelHackett opened 12 years ago

MichaelHackett commented 12 years ago

...which are removed by Nokogiri's XML parser. The existing code parses HTML as XML, so Nokogiri only recognizes XML's much smaller set of entities. To preserve HTML entities, parsing the text as an HTML fragment seems to work much better, and I have no encountered any issues with the change. (However, I have not been able to run the project tests --- they hang after the first test.)

I think there would be other cases where valid HTML would not be valid XML, so using the parser in XML mode does not seem appropriate here.