textile / python-textile

A Python port of Textile, A humane web text generator
Other
68 stars 23 forks source link

":" in URLs should not be encoded #29

Closed rcarmo closed 8 years ago

rcarmo commented 8 years ago

Following up from #27, custom URL schema processing requires the colon to not be URL-encoded.

ikirudennis commented 8 years ago

I've drawn up a subclass of Textile which looks like it will function for your needs: rawlink_textile.py Give it a try, and let me know if you need any further assistance.

rcarmo commented 8 years ago

Thanks, I will. But it does seem like an awful lot of code (and an extra dependency on six). I wonder why the links are getting encoded in the first place...

On 22 May 2016, at 16:19, Dennis Burke notifications@github.com wrote:

I've drawn up a subclass of Textile which looks like it will function for your needs: rawlink_textile.py Give it a try, and let me know if you need any further assistance.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

ikirudennis commented 8 years ago

Textile already requires six, so it's not an extra dependency.

Encoding the URIs is just the right thing to do: https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters I understand your use-case for it, but it seems that anything that processes URIs should be able to process the percent-encoded URI without issue. If it seems like the new version of textile broke your system, it's because textile was previously incorrect.

rcarmo commented 8 years ago

I'm sorry, but that just doesn't make sense.

That article pertains to encoding payload data, not markup.

I don't have that issue with Markdown or reStructured Text, and Textile worked that way on Python for at least eight years (and before that on PHP - my site was a PHPWiki back in 2001).

I also don't remember having that issue with the Java ports (which I used in Clojure when I did a test implementation a couple of years back), so as far as I'm concerned not encoding the schema in URLs is the expected behavior.

On 23 May 2016, at 01:16, Dennis Burke notifications@github.com wrote:

Textile already requires six, so it's not an extra dependency.

Encoding the URIs is just the right thing to do: https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters I understand your use-case for it, but it seems that anything that processes URIs should be able to process the percent-encoded URI without issue. If it seems like the new version of textile broke your system, it's because textile was previously incorrect.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

ikirudennis commented 8 years ago

The way you have your URIs structured, they are not seen as having schemes (technically what you have are paths, which should be percent-encoded). And that's all due to the RFCs governing the structure of URIs. The part of the gist which I've modified has to do with urllib's handling and parsing of the URI. For instance:

>>> from six.moves import urllib
>>> urllib.parse.urlsplit('Radar:1234')
SplitResult(scheme='', netloc='', path='Radar:1234', query='', fragment='')
>>> urllib.parse.urlsplit('Radar:/1234')
SplitResult(scheme='radar', netloc='', path='/1234', query='', fragment='')
>>> urllib.parse.urlsplit('Radar://1234')
SplitResult(scheme='radar', netloc='1234', path='', query='', fragment='')

I can't speak to how implementations of textile in other languages may have treated this situation in the past. My goal here is to have python-textile adhere as closely as possible to both the spec and php-textile in their current forms. (And unfortunately, we were so far behind the times that this update seems like a very drastic change.) For what it's worth, I've filed an issue on the textile-spec (textile/textile-spec#3) repo asking for some clarity on the proper handling of a situation like this.

Please let me know if that gist resolves your issue. You could also put your schema transformation in that encode_url function instead, if you wanted to eliminate a step.

rcarmo commented 8 years ago

Sorry, but I still think this is a breaking change that needs to be fixed. Looking at the RedCloth ragel grammar (which is the closest I can find to a machine-readable formal definition of the Textile language, lacking other PEG or EBNF alternatives), the definition for a URL is pretty clear:

https://github.com/jgarber/redcloth/blob/master/ragel/redcloth_common.rl#L111 https://github.com/jgarber/redcloth/blob/master/ragel/redcloth_common.rl#L111

On 23 May 2016, at 14:23, Dennis Burke notifications@github.com wrote:

The way you have your URIs structured, they are not seen as having schemes (technically what you have are paths, which should be percent-encoded). And that's all due to the RFCs governing the structure of URIs. The part of the gist which I've modified has to do with urllib's handling and parsing of the URI. For instance:

from six.moves import urllib urllib.parse.urlsplit('Radar:1234') SplitResult(scheme='', netloc='', path='Radar:1234', query='', fragment='') urllib.parse.urlsplit('Radar:/1234') SplitResult(scheme='radar', netloc='', path='/1234', query='', fragment='') urllib.parse.urlsplit('Radar://1234') SplitResult(scheme='radar', netloc='1234', path='', query='', fragment='') I can't speak to how implementations of textile in other languages may have treated this situation in the past. My goal here is to have python-textile adhere as closely as possible to both the spec and php-textile in their current forms. (And unfortunately, we were so far behind the times that this update seems like a very drastic change.) For what it's worth, I've filed an issue on the textile-spec (textile/textile-spec#3 https://github.com/textile/textile-spec/issues/3) repo asking for some clarity on the proper handling of a situation like this.

Please let me know if that gist resolves your issue. You could also put your schema transformation in that encode_url function instead, if you wanted to eliminate a step.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/textile/python-textile/issues/29#issuecomment-220978736

ikirudennis commented 8 years ago

...Still wondering if that gist works for you.

ikirudennis commented 8 years ago

Any update on whether the gist works for you?

rcarmo commented 8 years ago

I stand by my previous statement that this is a breaking change that needs to be addressed. I haven't tried the gist because I need this to work without patching, and as such I reverted to the previous version. Sorry.

ikirudennis commented 8 years ago

The reason I keep asking is to confirm whether or not the gist actually solves your problem without causing other errors for you. Whether you choose to use it in production is totally up to you. (For what it's worth, the gist functions as a drop-in fix, not exactly a patch. You might be able to get away with import rawlink_textile as textile and be on your way.) As it is, if I make that change to the project, too many other things change and get out of alignment with php-textile. We're sort of locked in a case of XKCD 1172: https://xkcd.com/1172/

I'm not going to make a change to this until we get a straight answer from textile-spec regarding a way forward. Sorry.

rcarmo commented 8 years ago

I can wait (and simply vendor the old version). Nevertheless, so far, the research I did into RedCloth (including the "new" version) supports my case in terms of historical precedent (that particular bit of the grammar I linked to goes back ages).

With some luck, the textile-spec folk will consider defining a formal grammar...

On 06 Jun 2016, at 18:08, Dennis Burke notifications@github.com wrote:

The reason I keep asking is to confirm whether or not the gist actually solves your problem without causing other errors for you. Whether you choose to use it in production is totally up to you. (For what it's worth, the gist functions as a drop-in fix, not exactly a patch. You might be able to get away with import rawlink_textile as textile and be on your way.) As it is, if I make that change to the project, too many other things change and get out of alignment with php-textile. We're sort of locked in a case of XKCD 1172: https://xkcd.com/1172/

I'm not going to make a change to this until we get a straight answer from textile-spec regarding a way forward. Sorry.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.