Open karenetheridge opened 4 years ago
Since the regex used in parse()
is based on the "non-validating regular expression" in https://tools.ietf.org/html/rfc3986#appendix-B, I wonder if it might make sense to add a parallel strict parsing method that did validate, or alternately have a validate method that applied the grammars in https://tools.ietf.org/html/rfc3986#appendix-A (of which more components than just scheme have tighter constraints than are currently reflected by the regex in parse()
).
This issue has been automatically marked as stale because it has not had recent activity. It may be closed if no further activity occurs. Thank you for your contributions.
I plan to look at this in the next month, tentatively with a new is_valid
method added to Mojo::URL. Please advise on any interface suggestions.
This issue has been automatically marked as stale because it has not had recent activity. It may be closed if no further activity occurs. This is not a judgment on the merits of the issue, but an indication that more information may be needed to determine the appropriate course of action, if any. Thank you for your contributions.
I hope to look at this soon, by adding a method (validate()
? is_valid()
?) to Mojo::URL to check each component, including scheme, for validity against the spec.
This issue has been automatically marked as stale because it has not had recent activity. It may be closed if no further activity occurs. This is not a judgment on the merits of the issue, but an indication that more information may be needed to determine the appropriate course of action, if any. Thank you for your contributions.
ok bot, I haven't forgotten
There's some good info (and test cases) here: https://claroty.com/wp-content/uploads/2022/01/Exploiting-URL-Parsing-Confusion.pdf https://www.grc.com/sn/SN-853-Notes.pdf
Steps to reproduce the behavior
Mojo::URL->new("bar,baz:foo")
Expected behavior
An error in
parse
, or construction of an empty object, because "bar,baz" is not a valid scheme.https://tools.ietf.org/html/rfc3986#section-3.1 provides the grammar:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
.Actual behavior
The above code constructs a uri object with scheme "bar,baz" and path "foo".
If this is accepted as a valid issue I can provide a pull request.