npp-plugins / mimetools

Other
43 stars 11 forks source link

URL Encode doesn't convert some symbols #28

Closed FAIdentekit closed 7 months ago

FAIdentekit commented 7 months ago

URL Encode doesn't converting some symbols

Copied from Issue #10866

Description of the Issue URL Encode don't converting some symbols

Steps to Reproduce the Issue

  1. Type date in iso string format for parameter url : lastUpdateDate<date(2021-10-26T12:15:00.000+05:00)
  2. Go to Plugins -> MIME Tools -> URL Encode

Expected Behavior Converted string lastUpdateDate%3Cdate%282021-10-26T12%3A15%3A00.000%2B05%3A00%29

Actual Behavior Converted string lastUpdateDate%3Cdate(2021-10-26T12%3A15%3A00.000+05%3A00)

Debug Information

Notepad++ v8.1.9.2 (32-bit)
Build time : Nov 21 2021 - 04:27:12
Path : C:\Program Files (x86)\Notepad++\notepad++.exe
Command Line :
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 10 Enterprise (64-bit)
OS Version : 2009
OS Build : 19043.1348
Current ANSI codepage : 1251
Plugins : ComparePlugin.dll mimeTools.dll NppConverter.dll NppExport.dll NPPJSONViewer.dll

This still seems to be an issue as of Notepad++ v8.6 (64-bit)

donho commented 7 months ago

Summary: (, ) and + are not converted in the example you provide when you run URL Encode.

According RFC 1738: https://datatracker.ietf.org/doc/html/rfc1738

Unsafe:

Characters can be unsafe for a number of reasons.The space
character is unsafe because significant spaces may disappearand
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word - processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark(""") is used to
    delimit URLs in some systems.The character "#" is unsafe and should
    always be encoded because it is used in World Wide Web and in other
    systems to delimit a URL from a fragment / anchor identifier that might
    follow it.The character "%" is unsafe because it is used for
    encodings of other characters.Other characters are unsafe because
    gatewaysand other transport agents are known to sometimes modify
    such characters.These characters are "{", "}", "|", "\", " ^ ", "~",
    "[", "]", and "`".

    All unsafe characters must always be encoded within a URL.For
    example, the character "#" must be encoded within URLs even in
    systems that do not normally deal with fragment or anchor
    identifiers, so that if the URL is copied into another system that
    does use them, it will not be necessary to change the URL encoding.

Reserved:

Many URL schemes reserve certain characters for a special meaning :
their appearance in the scheme - specific part of the URL has a
designated semantics.If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded.The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme.No other characters may
be reserved within a scheme.

Usually a URL has the same interpretation when an octet is
represented by a characterand when it encoded.However, this is not
true for reserved characters : encoding a character reserved for a
particular scheme may change the semantics of a URL.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

On the other hand, characters that are not required to be encoded
(including alphanumerics) may be encoded within the scheme - specific
part of a URL, as long as they are not being used for a reserved
purpose.

So the following characters $-_.+!*'(), can be not encoded, according RFC 1738