Closed chbndrhnns closed 4 years ago
A single single-quote is enough to break the parser. This example also does not work for me:
class A:
"""VRF's"""
I cannot reproduce, can you share the pytkdocs version please?
I am using pytkdocs 0.6
Somehow the single quote gets converted to ‘
and then the XML parsing fails:
from xml.etree.ElementTree import XML
text = """<div class="doc doc-contents first">
<p>‘</p>
</div>"""
XML(text)
This fails with
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/powerjo/.pyenv/versions/3.8.2/lib/python3.8/xml/etree/ElementTree.py", line 1320, in XML
parser.feed(text)
File "<string>", line None
xml.etree.ElementTree.ParseError: undefined entity: line 19, column 9
What Python version are you using?
Could you try to run the test suite for pytkdocs on your macos laptop?
git clone https://github.com/pawamoy/pytkdocs
cd pytkdocs
make setup test
The tests all pass. I am using Python 3.8. For the test suite, however, poetry installs 3.7.
When I look at the exception in the original post, I see that the pytkdocs
does not occur in the library but the error occurs in the an extension module of mkdocstrings
:
File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocstrings/extension.py", line 168, in run
as_xml = XML(rendered)
File "/Users/powerjo/.pyenv/versions/3.8.2/lib/python3.8/xml/etree/ElementTree.py", line 1320, in XML
parser.feed(text)
File "<string>", line None
xml.etree.ElementTree.ParseError: undefined entity: line 13, column 4
I found the issue:
markdown_extensions:
- smarty
Smarty converts the quotes which then cannot be parsed by the XML library.
It would be required to unescape the entities first:
from xml.etree.ElementTree import XML
import html
text = "<div>‘</div>"
unescaped = html.unescape(text)
XML(unescaped) # passes
XML(text) # fails
When I look at the exception in the original post, I see that the pytkdocs does not occur in the library but the error occurs in the an extension module of mkdocstrings:
Oh yes, you're right, sorry about that, I was tired...
I found the issue:
Great! Thank you for debugging this :slightly_smiling_face:
So, I don't think it's possible to unescape the contents, as <
or >
would then break the XML parsing as well.
But I wonder if wrapping the contents in <html>...</html>
would make the parser "understand" the ‘
and similar escaped characters. I'll try that and report back :slightly_smiling_face:
Wow, that was a wild ride.
The XMLParser
class has a html
parameter, with which you could define entities such as lsquo
, but this parameter is now deprecated. The class sets self.entity = {}
.
In Python 2 you could therefore do parser = XMLParser(); parser.entity["lsquo"] = "..."
, but it doesn't work anymore in Python 3 because it uses C extensions, so you cannot change the object, and trying to access the parser's attributes ends in AttributeError
. You cannot inspect the object in debugging sessions either.
I finally found a solution on this SO post. You have to prepend the to-be-parsed text with the entities definition so the parser doesn't crash on them.
ENTITIES = """
<!DOCTYPE html [
<!ENTITY nbsp '&nbsp;'>
<!ENTITY lsquo '&lsquo;'>
<!ENTITY rsquo '&rsquo;'>
<!ENTITY ldquo '&ldquo;'>
<!ENTITY rdquo '&rdquo;'>
<!ENTITY laquo '&laquo;'>
<!ENTITY raquo '&raquo;'>
<!ENTITY hellip '&hellip;'>
<!ENTITY ndash '&ndash;'>
<!ENTITY mdash '&mdash;'>
]>
"""
parsed = XML(ENTITIES + text)
Damn SmartyPants :angry: :heart: :anger: :hot_face: !
I'll try to release the fix soon.
@pawamoy Were you able to add the fix for this issue to another release, already?
@chbndrhnns I will do it now, thanks for the reminder :slightly_smiling_face:
This is fixed in 0.12.1, please reopen if needed.
Describe the bug If a docstring contains single-quoted text, it cannot be parsed
click to toggle
``` Traceback (most recent call last): File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/tornado/ioloop.py", line 907, in _run return self.callback() File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/livereload/handlers.py", line 69, in poll_tasks filepath, delay = cls.watcher.examine() File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/livereload/watcher.py", line 105, in examine func() File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocs/commands/serve.py", line 136, in builder build(config, live_server=live_server, dirty=dirty) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocs/commands/build.py", line 274, in build _populate_page(file.page, config, files, dirty) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocs/commands/build.py", line 174, in _populate_page page.render(config, files) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocs/structure/pages.py", line 183, in render self.content = md.convert(self.markdown) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/markdown/core.py", line 265, in convert root = self.parser.parseDocument(self.lines).getroot() File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/markdown/blockparser.py", line 90, in parseDocument self.parseChunk(self.root, '\n'.join(lines)) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/markdown/blockparser.py", line 105, in parseChunk self.parseBlocks(parent, text.split('\n\n')) File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/markdown/blockparser.py", line 123, in parseBlocks if processor.run(parent, blocks) is not False: File "/Users/powerjo/dev/juice/juiceutils/.venv/lib/python3.8/site-packages/mkdocstrings/extension.py", line 168, in run as_xml = XML(rendered) File "/Users/powerjo/.pyenv/versions/3.8.2/lib/python3.8/xml/etree/ElementTree.py", line 1320, in XML parser.feed(text) File "To Reproduce Trying to parse this class fails with
xml.etree.ElementTree.ParseError: undefined entity: line 13, column 4
:Expected behavior The example can be parsed.
Screenshots If you are using
pytkdocs
throughmkdocstrings
and if relevant, please attach a screenshot.System (please complete the following information):
pytkdocs
version [e.g. 0.2.1]Additional context Add any other context about the problem here.