iamjackg / md2cf

Convert and upload Markdown documents to Confluence
MIT License
91 stars 49 forks source link

Error parsing xhtml: Unexpected character \'>\' (code 62) expected \'=\'\\n #92

Closed harish0619 closed 1 year ago

harish0619 commented 1 year ago

I have a markdown file and I convert it into a mediawiki format using pypandoc and then use md2cf to upload to confluence. However, I'm facing the issue ( pasted in subject ) everytime i try to run the code.

Can you help me in pointing out what could be happening?

Unfortunately, the file that i'm trying to convert is a proprietary document and i can't share it here.

galund commented 1 year ago

Mediawiki format is similar but not the same as Markdown, so I don't think I would expect that to work. Why are you not uploading the original markdown using md2cf?

harish0619 commented 1 year ago

Because it throws the same error as i described in the issue.

b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Error parsing xhtml: Unexpected character \'>\' (code 62) expected \'=\'\n at : [11,26]","reason":"Bad Request"}'

Is there a way to read the xthml file it generates during the process?

galund commented 1 year ago

This sounds a bit like the problem I had in #81 - I wonder if you have some image alt text with an angle bracket or quote mark in it, which seems to cause the MD->HTML conversion lib Mistune to produce broken XHTML.

You get more debug information if you add the --debug flag to md2cf.

If you want to see what's in the HTML when it's breaking, you might want to add import pdb; pdb.set_trace()

to after line 440 (after where it says except HTTPError as e:) and have a poke around in the page object.

You also might like to try my branch that has an updated version of Mistune https://github.com/alphagov/md2cf/tree/upgrade-mistune.

harish0619 commented 1 year ago

Thanks for the swift response. I tried your branch but I get the same error and i tried using debug and pasting the response below. Can you point out exactly where I can put the debugger trace?

`─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/main.py:402 in main │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/main.py' │ │ │ │ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/upsert.py:93 in upsert_page │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/upsert.py' │ │ │ │ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:176 in create_page │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │ │ │ │ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:73 in _post │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │ │ │ │ /usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py:66 in _request │ │ │ │ [Errno 20] Not a directory: │ │ '/usr/local/lib/python3.7/site-packages/md2cf-2.1.0-py3.7.egg/md2cf/api.py' │ │ │ │ /usr/local/lib/python3.7/site-packages/requests-2.28.2-py3.7.egg/requests/models.py:1021 in │ │ raise_for_status │ │ │ │ 1018 │ │ │ ) │ │ 1019 │ │ │ │ 1020 │ │ if http_error_msg: │ │ ❱ 1021 │ │ │ raise HTTPError(http_error_msg, response=self) │ │ 1022 │ │ │ 1023 │ def close(self): │ │ 1024 │ │ """Releases the connection back to the pool. Once this method has been │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ http_error_msg = '400 Client Error: for url: │ │ │ │ https://oneconfluence.verizon.com/rest/api/content' │ │ │ │ reason = '' │ │ │ │ self = <Response [400]> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ HTTPError: 400 Client Error: for url: https://oneconfluence.verizon.com/rest/api/content

📄️ deploy_openstack ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ❌ Error while uploading

Total progress ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00

400 Client Error: for url: https://oneconfluence.verizon.com/rest/api/content - b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Error parsing xhtml: Unexpected character \'>\' (code 62) expected \'=\'\n at : [11,26]","reason":"Bad Request"}'`

galund commented 1 year ago

Heh, I forgot to give you the file name didn't I, it's md2cf/main.py where the error is usually caught, and where you have a page object that you can look at.

harish0619 commented 1 year ago

Upon debugging, found out the issue was with the character '>'. Seems the tool wasn't able to differentiate it as xhtml also has similar characters. Thank you for your inputs.

Maybe this is something that can be fixed in the mistune. You can close the issue and consider it resolved. Thanks once again!