sindresorhus / normalize-url

Normalize a URL
MIT License
837 stars 123 forks source link

Semicolons are erroneously encoded in query params #85

Open marcelklehr opened 5 years ago

marcelklehr commented 5 years ago

Hey,

I've had a user report the following normalization:

normalize('https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128') == 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128'

...which according to the user didn't preserve the semantics of the URL.

Checking the RFC, it appears that ; and = are part of the sub-delims non-terminal which defines a section of reserved characters that should not be encoded.

Am I missing something?

sindresorhus commented 5 years ago

It's just URL encoded. It doesn't change any semantics of the URL:

const a = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128';
const b = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128';

new URL(a).searchParams.get('Action') === new URL(b).searchParams.get('Action')
//=> true
marcelklehr commented 5 years ago

Mh. I assume this is because the URL implementation simply treats ; as data, which is fine, but it's not canonical.

The above-mentioned RFC says:

  reserved    = gen-delims / sub-delims

 gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

 sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / ";" / "="

The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent- encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications.

marcelklehr commented 5 years ago

Incidentally,

const a = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom;TicketID=707128';
const b = 'https://my.otrs.dom/index.pl?Action=AgentTicketZoom%3BTicketID%3D707128';

new URL(a).search === new URL(b).search
//=> false
marcelklehr commented 5 years ago

Alright, so the URI spec is being superseded by the URL spec, which uses the application/x-www-form-urlencoded format for the query string and that doesn't seem to care about the reserved characters in URIs. Wow.