While using nh3 library, we came across a use case, where HTML content is expected for a field, but we need to remove the content that can cause XSS attack. Using nh3.clean() directly on the input text doesn't give the expected result and a lot of useful data is getting trimmed ultimately modifying the html template input.
import nh3
text = '''
<!DOCTYPE html>
<html>
<head>
<title>HTML Tutorial</title>
</head>
<body>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
</body>
</html>
'''
nh3.ALLOWED_TAGS.add('title')
nh3.ALLOWED_TAGS.add('head')
nh3.ALLOWED_TAGS.add('html')
nh3.ALLOWED_TAGS.add('div')
nh3.ALLOWED_TAGS.add('body')
print(nh3.clean(text,tags=nh3.ALLOWED_TAGS,strip_comments=False))
Output:
<title>HTML Tutorial</title>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
We don't want to trim the html or head or body tags. Is there any limitation to nh3 library which does not allow these tags?
While using nh3 library, we came across a use case, where HTML content is expected for a field, but we need to remove the content that can cause XSS attack. Using nh3.clean() directly on the input text doesn't give the expected result and a lot of useful data is getting trimmed ultimately modifying the html template input.
We don't want to trim the html or head or body tags. Is there any limitation to nh3 library which does not allow these tags?