simonw / sqlite-comprehend

Tools for running data in a SQLite database through AWS Comprehend
Apache License 2.0
6 stars 0 forks source link

--strip-tags option #9

Closed simonw closed 2 years ago

simonw commented 2 years ago

I ran this against columns containing HTML and got results like this:

image

Add an option to --strip-tags before sending text to the API. It can use a simple regular expression.

tag_re = re.compile("<.*?>")
def strip_tags(s):
    return tag_re.sub("", s)
simonw commented 2 years ago

Or I might borrow strip_tags() from Django, since it's better tested against more cases:

https://github.com/django/django/blob/f8f16b3cd85599b464cbc5c7e884387940c24e6f/django/utils/html.py#L141-L182