scrapy / scrapely

A pure-python HTML screen-scraping library
1.86k stars 315 forks source link

safehtml should ensure tabular content safety #24

Closed omab closed 6 years ago

omab commented 12 years ago

safehtml should ensure that tabular content is safe to display enforcing <table> tags where needed, take as an example:

>>> print safehtml(htmlregion(u'<span>pre text</span><tr><td>hello world</td></tr>'))
u'pre text<tr><td>hello world</td></tr>'

That output will break any table layout where the content is rendered.

pablohoffman commented 12 years ago

Can you add what the proper/expected output should be?

omab commented 12 years ago

The expected output would be:

>>> print safehtml(htmlregion(u'<span>pre text</span><tr><td>hello world</td></tr>'))
u'pre text<table><tr><td>hello world</td></tr></table>'
kalessin commented 12 years ago

on it