jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.61k stars 3.38k forks source link

hsmarkdown / pandoc convert " " to "+" in [img] syntax #30

Closed jgm closed 13 years ago

jgm commented 13 years ago

What steps will reproduce the problem?

echo "![test image](/a b/c.jpg)" | hsmarkdown

What is the expected output? What do you see instead?

expected: markdown.pl gives:

!test image

I see: hsmarkdown gives:

<p

<img src="/a+b/c.jpg" alt="test image" /></p

What version of the product are you using? On what operating system?

pandoc-1.4, mac / linux both reproducible

Please provide any additional information below.

none

Google Code Info: Issue #: 220 Author: nfjinj...@gmail.com Created On: 2010-02-25T02:31:47.000Z Closed On: 2010-02-27T03:09:14.000Z

jgm commented 13 years ago

btw, both mac and linux on GHC 6.12.1

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-25T02:33:27.000Z

jgm commented 13 years ago

oops, I just realized that the result from markdown is not correct neither. I found this issue when migrating my blog, which uses pandoc as a rendering engine. Having a space inside an image url used to give a "correct" link, but not anymore. I'm not sure it's the latest version of pandoc that changes this behavior.

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-25T02:40:46.000Z

jgm commented 13 years ago

more info... this is what worked for me, when using pandoc-1.2.1

echo "![test image](/a b/c.jpg)" | pandoc

<p

<img src="/a%20b/c.jpg" alt="test image" /></p

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-25T02:49:58.000Z

jgm commented 13 years ago

I made this change because I was under the impression that browsers treated a '+' character as equivalent to '%20'. THat is certainly the case with the browsers I use, and it is fairly standard to use + for spaces in URLs. Does it not work in your browser?

Google Code Info: Author: fiddloso...@gmail.com Created On: 2010-02-25T14:01:59.000Z

jgm commented 13 years ago

http://stackoverflow.com/questions/1211229/in-a-url-should-spaces-be-encoded- using-20-or suggests that officially, you should use %20 in the path part of the URL, and + in the query part. I didn't realize this, and I haven't yet seen any bad effects of encoding spaces as + in both parts. So it would be useful to know what is choking on this...

Google Code Info: Author: fiddloso...@gmail.com Created On: 2010-02-25T14:08:08.000Z

jgm commented 13 years ago

Thanks for the info, here's what I've been experiencing,

Copying straightly from one of my post:

![11.png](/images/album/10-02-08 Rika game engine preview 3/11.png)

gives:

http://jinjing.funkymic.com/images/album/10-02- 08+Rika+game+engine+preview+3/11.png

which is not accessible, yet

http://jinjing.funkymic.com/images/album/10-02- 08%20Rika%20game%20engine%20preview%203/11.png

is.

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-25T14:49:17.000Z

jgm commented 13 years ago

Yes, I can reproduce the bug. (And, just to make sure it's not a peculiarity of nginx, I reproduced it on apache too.)

OK, I'll modify pandoc to revert to the old behavior with %20 instead of +. In the mean time, you should be able to use %20 directly in your markdown URLs; pandoc knows not to escape the %.

Google Code Info: Author: fiddloso...@gmail.com Created On: 2010-02-25T16:24:56.000Z

jgm commented 13 years ago

Thanks for accepting.

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-25T19:00:00.000Z

jgm commented 13 years ago

Fixed in r1847. I think I've got it right -- markdown should allow you to use exact URLs, with things like %20, but also to use special characters and spaces without escaping them. This patch makes this happen; the link source is first unescaped (in case they've used proper escapes) and then escaped (in case they haven't). Let me know if it doesn't work properly for you.

Google Code Info: Author: fiddloso...@gmail.com Created On: 2010-02-27T03:09:14.000Z

jgm commented 13 years ago

I confirm that it fixes for non-unicode url.

But for unicode, it seems to break:

e.g. After applying this patch:

This works:

![11.png](/images/album/10-02-08 Rika game engine preview 3/11.png) -> http://jinjing.funkymic.com/images/album/10-02- 08%20Rika%20game%20engine%20preview%203/11.png

This does not: (not quite work safe, too lazy to make a test case)

![1.jpg](/images/album/10-02-16 舞 Hime/1.jpg) -> http://jinjing.funkymic.com/images/album/10-02-16%20%821E%20Hime/1.jpg

this fix it:

http://jinjing.funkymic.com/images/album/10-02-16%20舞%20Hime/1.jpg

I confirm it's not nginx specific, since I get the same behavior locally without a reverse proxy.

The problem is that this "舞" char is url-escaped. I don't quite understand what's going on here, should it not?

A trivial patch (attached) that pre-escape string to utf-8 seems to fix the problem from my smoke test.

Regards

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-27T04:23:29.000Z

jgm commented 13 years ago

I tried a simpler approach in r1851 -- just changing spaces into %20, and not messing with anything else. Does that work better?

Google Code Info: Author: fiddloso...@gmail.com Created On: 2010-02-27T05:02:23.000Z

jgm commented 13 years ago

Yep, works pretty well for me :)

Google Code Info: Author: nfjinj...@gmail.com Created On: 2010-02-27T05:17:22.000Z