samhh / fp-ts-std

The missing pseudo-standard library for fp-ts.
https://samhh.github.io/fp-ts-std/
MIT License
207 stars 27 forks source link

`URLPath.fromString` parses its arguments as a relative reference rather than a path. #204

Closed astlouisf closed 6 months ago

astlouisf commented 6 months ago

Context

Here's the grammar specification for URLs (rfc3986):


 URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

 hier-part     = "//" authority path-abempty
               / path-absolute
               / path-rootless
               / path-empty

 URI-reference = URI / relative-ref

 absolute-URI  = scheme ":" hier-part [ "?" query ]

 relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

 relative-part = "//" authority path-abempty
               / path-absolute
               / path-noscheme
               / path-empty

 scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

 authority     = [ userinfo "@" ] host [ ":" port ]

 [...]

 path          = path-abempty    ; begins with "/" or is empty
               / path-absolute   ; begins with "/" but not "//"
               / path-noscheme   ; begins with a non-colon segment
               / path-rootless   ; begins with a segment
               / path-empty      ; zero characters

 path-abempty  = *( "/" segment )
 path-absolute = "/" [ segment-nz *( "/" segment ) ]
 path-noscheme = segment-nz-nc *( "/" segment )
 path-rootless = segment-nz *( "/" segment )
 path-empty    = 0<pchar>

 segment       = *pchar
 segment-nz    = 1*pchar
 segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
               ; non-zero-length segment without any colon ":"

 pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

 query         = *( pchar / "/" / "?" )

 fragment      = *( pchar / "/" / "?" )

 pct-encoded   = "%" HEXDIG HEXDIG

 unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
 reserved      = gen-delims / sub-delims
 gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
 sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
               / "*" / "+" / "," / ";" / "="

URL's pathname getter

The algorithm used to generate pathname ensures that a pathname can only match either path-abempty or path-empty (path-empty)

URL's constructor

The URL constructor expect its arguments to be URLs and parses them according to the URI-reference grammar rule.

Issue

It is wrong to assume that path can be parsed as a URI-reference.

  1. path-abempty cannot match the URI grammar rule so ends up matching relative-ref
  2. In relative-part, a string of the form //.* can only match the production rule "//" authority path-abempty. This rule expect a non-empty authority which lead to the rejection of "//". This also means that any first segment of a path-abempty would be interpreted as an authority

Considering that the type URLPath represents paths, it seems wrong that its fromString function considers its argument as URI-reference-first.

Proposed solution

  1. Rename fromString to fromURLString
  2. Reimplement fromString to only accept strings of the form path-abempty

The second point can be achieved by the following implementation[^1] which reuses existings URL's parsing.

export const fromString = <E>(
  f: (e: TypeError) => E,
): ((x: string) => Either<E, URLPath>) =>
  flow(
    E.fromPredicate(
    (x) => x === '' || x.startsWith('/'),
    () => f(new TypeError('Invalid path')),
    ),
    E.chain(
      E.tryCatchK(
        (x) =>
          fromURL(
            // See https://github.com/nodejs/node/issues/52494
            // eslint-disable-next-line local/no-banned-constructors
            new globalThis.URL(`http://host.invalid${x}`),
          ),
        (e) => f(e as TypeError),
      ),
    ),
  );

[^1]: code by @mlegenhausen

samhh commented 6 months ago

https://github.com/samhh/fp-ts-std/commit/93e3c8696068de40585da36838760fe813adc705