Open SirbitoX opened 8 years ago
Hi @SirbitoX. I was having a discussion about this last week and we were thinking about adding a new less strict version of safe html
. The new type would be somewhere between raw html
and safe html
keeping img
tags and possibly other tags too.
Other than img
tags what other tags do you add? Would you mind explaining your specific use case? Are you extracting articles or products or leads?
Hi @ruairif,
I'm extracting articles and I keep all the images in the description of scraped article so to do this I would need the src
attribute or even height
and width
attributes of the img
tag.
Probably I plan to keep the embed videos in the description, either. But it wouldn't be an issue if we support something like allowed_attributes
.
Let's consider that someone (like me) want to keep an
img
tag so thesrc
attribute of this tag would be important for him/her. Butsafehtml()
function omit all the attributes of the relevant tag. I think it would better to keep attributes ofallowed_tags
or add another param namedallowed_attributes
to specify which attributes to keep.