Closed pgsantos-pt closed 2 months ago
As far as I am aware, if you enclose URLs in angle brackets, you can have verbatim square brackets and spaces in it:
[Overview](<[ProjectID] Overview.md>)
When rendering Markdown as HTML, this turns into a link with the right value for href
:
<a href="[ProjectID] Overview.md">
The browser is going to URL-encode the href
value when the referred document is fetched from a remote server.
I am always a little hesitant to apply a transformation unconditionally. Does Markdown mandate that all references are URL-encoded? If so, it is the right approach to URL-decode references. Otherwise, md2conf might decode something that is not meant to be decoded, e.g. %30 off
would turn into 0 off
(%30
is the code for the character 0
).
We might be able to cut the Gordian knot if we URL-decode strings that comprise of URL-encoded characters only, and leave everything else as is.
OK, let me try that first approach and I'll get back to you.
That worked! Many thanks 🙏
Hello,
After you merged PR #58, I've notice you removed the instruction
url = urllib.parse.unquote(anchor.attrib["href"])
in the first line of themd2conf.converter.ConfluenceStorageFormatConverter._transform_link
method. I think thisurl
var is local and it's only used to help converting links, correct? Therefore, the instruction that I added should be pretty harmless although I might be wrong so please let me know if I'm saying something incorrect.Why did I add this instruction? Basically, Confluence pages need to have unique names and so, in order to avoid overwriting pages from other people in the company we usually write our titles as
[ProjectID] Overview.md
, for instance. Now, because of the special characters, the relative path is written (this is done automatically by the MD editor) as[Overview](%5ProjectID%5D%20Overview.md)
. So, when you run the script and a log like this appearsINFO - synchronize_directory [61] - indexed 2 page(s)
it means that one of the pages got stored with the file name, in this case[ProjectID] Overview.md
. However, when the script gets to the conversion, it finds the relative path%5ProjectID%5D%20Overview.md
. When it tries to lookup%5ProjectID%5D%20Overview.md
it won't find it because it was stored as[ProjectID] Overview.md
. Therefore, in order to solve this problem I need that unquote instruction.My question is, could you reintroduce that instruction or is it going to create potential problems? Maybe I was to eager and unquoted the whole
href
but ideally I would need the unquote at least to do the relative path lookup.Thank you in advance.