thisismess / django-seo-cascade

django-seo-cascade is a Django app that provides automatic sitemap generation, cascading meta tag blocks, and overrides via the Django admin.
http://www.thisismess.com
12 stars 5 forks source link

Use an HTML parser instead of an XML parser #2

Open mandx opened 12 years ago

mandx commented 12 years ago

Use a HTML compliant parser: In some doctypes (including HTML5) the meta tags do not need to be closed (like `<meta ... />) but the XML parser fails to read this tags. Also, some HTML entities are not recognized, like » (»).

lynndylanhurley commented 12 years ago

That's great advice - I'll make that change today. Thx :)

lynndylanhurley commented 12 years ago

I initially chose lxml for its speed (http://blog.dispatched.ch/2010/08/16/beautifulsoup-vs-lxml-performance/), but I can't see this app causing any major performance issues.

From what I can see, Beautiful Soup might be a better choice. Do you agree?

mandx commented 12 years ago

I haven't used Beautiful Soup, but you can check html5lib and compare. I use it a lot, mainly for sanitizing HTML.