python-hyper / rfc3986

A Python Implementation of RFC3986 including validations
https://rfc3986.readthedocs.io/en/latest/
Other
185 stars 32 forks source link

Stopped abnf regex from halting when it encounters a line terminator in the fragment #100

Closed JohnJamesUtley closed 1 year ago

JohnJamesUtley commented 1 year ago

fixes #99

sigmavirus24 commented 1 year ago

DOTALL is a sledgehammer. We should use (?s:.*) for the fragment matching if we actually want to allow newlines there. I'm not convinced we should.

JohnJamesUtley commented 1 year ago

DOTALL is a sledgehammer. We should use (?s:.*) for the fragment matching if we actually want to allow newlines there. I'm not convinced we should.

You are absolutely correct that DOTALL is too much and the inline flag is a much cleaner solution. I've switched the pull request to amend that just now.

The change is necessary because in other similar sections of the URL, such as the query and the path, line termination characters are accepted and are converted into percent-encoded characters when the path and query are returned. It is inconsistent that the fragment section treats these same characters differently.

I would expect line termination characters in the fragment to be percent-encoded like they are in the query and the path.

Additionally, the other Python URL parsers handle line termination characters in the fragment section by converting them to percent-encoded characters in their output. Ending the parsing at line termination characters in the fragment section instead of percent-encoding them is strange behavior in my opinion. Especially since line termination characters elsewhere in the URL do not end the parsing. (I tested Furl, Hyperlink, Urllib, and Yarl).

sigmavirus24 commented 1 year ago

Can you add a test case or two?

JohnJamesUtley commented 1 year ago

@sigmavirus24 Made the changes you requested 👍