Closed crantok closed 8 years ago
Just updated and tested in my own code. I like the way you reduced my suggestion to the simplest possible feature.
Thank you :)
I like the way you reduced my suggestion to the simplest possible feature.
I figure that:
/s/s+
with /s
Plus... I'm lazy :)
Awesome :)
I wanted this for the same reason! Thanks!
For the purposes of indexing, it's a little unfortunate that AddSpaceWhenStrippingTag(true)
also inserts spaces when it removes inline tags. So sanitizing <div>Go with<em>out</em></div><div>me</div>
yields Go with out me
instead of Go without me
.
Not a blocker for me, but thought I'd point it out. :)
It's probably not possible to know what is an inline tag in a general case, unfortunately. Even <em>
can be a block tag if CSS includes em { display: block; }
or em { padding: 20px; }
.
I'm using the StrictPolicy() to strip tags from text in order to feed mongoDB full text search. The text content of adjacent elements may be visually separated by html rendering even though there is no whitespace in the text. Stripping the tags therefore merges words potentially altering search results. Here's an example:
I can easily solve this in my own code, e.g. by inserting a space before or after every block-level html element before stripping the tags.
I wondered whether this would be a generally useful feature. A general case might need configuration given that even adjacent inline elements can be visually separated through CSS.