markdownify in ./.venv/lib/python3.8/site-packages (0.13.1)
The issue is found with atheris library
The code to reproduce the issue:
markdownify("<html><body><h5555555555>My First Heading</h5555555555><p>My first paragraph.</p></body></html>")
My machine had frozen. Ubuntu 20.04. 16 GB. The memory usage went to 100% in 2-3 seconds. The only way to fix it is to turn it off/on.
The only valid cases are h1 - h6. We should ignore everything else.
It could be an edge case but it could be possible to feed the string in the example into a server to cause resource exhaustion.
Related cases are (will be fixed if we fix the original issue)
import sys
markdownify(f"<h5{sys.maxsize // 10}>")
Traceback (most recent call last):
File "/home/redacted/code/other/atheris/pythonProject/test_unit_009-1.py", line 22, in <module>
markdownify(f"<h5{sys.maxsize // 10}>")
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 433, in markdownify
return MarkdownConverter(**options).convert(html)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 105, in convert
return self.convert_soup(soup)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 108, in convert_soup
return self.process_tag(soup, convert_as_inline=False, children_only=True)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 151, in process_tag
text += self.process_tag(el, convert_children_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 156, in process_tag
text = convert_fn(node, text, convert_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 188, in convert_tag
return self.convert_hn(n, el, text, convert_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 283, in convert_hn
hashes = '#' * n
MemoryError
and
import sys
markdownify(f"<h{sys.maxsize + 1}>")
Traceback (most recent call last):
File "/home/redacted/code/other/atheris/pythonProject/test_unit_009-1.py", line 15, in <module>
markdownify(f"<h{sys.maxsize + 1}>")
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 433, in markdownify
return MarkdownConverter(**options).convert(html)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 105, in convert
return self.convert_soup(soup)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 108, in convert_soup
return self.process_tag(soup, convert_as_inline=False, children_only=True)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 151, in process_tag
text += self.process_tag(el, convert_children_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 156, in process_tag
text = convert_fn(node, text, convert_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 188, in convert_tag
return self.convert_hn(n, el, text, convert_as_inline)
File "/home/redacted/code/other/atheris/pythonProject/.venv/lib/python3.8/site-packages/markdownify/__init__.py", line 283, in convert_hn
hashes = '#' * n
OverflowError: cannot fit 'int' into an index-sized intege
markdownify in ./.venv/lib/python3.8/site-packages (0.13.1)
The issue is found with atheris library
The code to reproduce the issue:
markdownify("<html><body><h5555555555>My First Heading</h5555555555><p>My first paragraph.</p></body></html>")
My machine had frozen. Ubuntu 20.04. 16 GB. The memory usage went to 100% in 2-3 seconds. The only way to fix it is to turn it off/on.
The only valid cases are h1 - h6. We should ignore everything else. It could be an edge case but it could be possible to feed the string in the example into a server to cause resource exhaustion.
Related cases are (will be fixed if we fix the original issue)
and