googlearchive / py-gfm

This repository is unmaintained. Please see Zopieux/py-gfm for the new canonical repository.
https://github.com/Zopieux/py-gfm
BSD 3-Clause "New" or "Revised" License
73 stars 30 forks source link

Support (but validate) HTML #2

Open nex3 opened 11 years ago

nex3 commented 11 years ago

GFM supports a limited subset of HTML, including at least <a> and <img> tags. We should support that as well, with sufficient scrubbing to make it safe to use.

nex3 commented 11 years ago

GitHub uses redcarpet for its rendering, so we can look at their implementation of this to figure out how to do it safely.

nex3 commented 11 years ago

GitHub actually supports all manner of HTML tags, including inline formatting tags all the way up to tables. We should match this behavior.

jmesserly commented 11 years ago

if you need a Python html5 parser, I can recommend https://code.google.com/p/html5lib :) http://stackoverflow.com/questions/5266134/best-practice-for-allowing-markdown-in-python-while-preventing-xss-attacks

apparently they have a sanitizer too (we haven't look at that yet for Dart)

jmesserly commented 11 years ago

(https://code.google.com/p/html5lib/source/browse/python/html5lib/sanitizer.py)

nex3 commented 11 years ago

Awesome, thanks for the tip, John.