Closed anmonteiro closed 3 years ago
Interesting work. I would like to now that you translated already available regexp to angstorm
parser or you took what is described by the RFC as is?
@dinosaure This is mostly based on what the previous implementation did. I was reading the spec along with the work and noticed that we don't follow it too closely in some places.
My rationale for preserving the same assumptions was that it's already a fairly big change, and I didn't want to introduce more noise than necessary.
I think this sets up a good base to iterate on the parts that need to follow the spec more cloesly. What do you think?
I think this sets up a good base to iterate on the parts that need to follow the spec more cloesly. What do you think?
I agree with you but ocaml-uri
is widely used by many people. So we should have something to test your PR with others packages and see what happens first. Just to avoid any large break :+1: .
Thanks for the excellent work on this PR! I'm just getting back from vacation and can review it more over the next week.
I'm in favour of merging and releasing an Angstrom version that is as close to the behaviour of uri
as currently stands, and then improving the specification compat in subsequent releases. How about setting up an AFL fuzz test to check for behavioural differences between uri-re and uri to spot any places where it differs?
I've pushed avsm/ocaml-uri@4bf0535dea13ed98d92d6dd8fb8402d50bbbaae0 over to https://github.com/avsm/ocaml-uri/tree/angstrom-parser that implements the differential fuzzer. So far, it's only found one difference: the handling of \n
in inputs. The old implementation would terminate the parse upon hitting a newline, but the new one will urlencode the \n and continue consuming input. We could make it terminate in the new implementation to maintain existing behaviour, and subsequently allow them in a later release.
This is great! I can add you to my fork if you wanna push there. I can look into fixing that difference soon too.
pushed :)
Any further thoughts on that difference Antonio? It would be good to get this into a release
Thanks for the ping -- I had forgotten about this but I'm still interested in it. Give me a couple days and I'll look into this.
Much appreciated @anmonteiro! No rush of course :-)
@avsm Just pushed a commit that fixes the differences in newline handling between the old (regex) and new (Angstrom) parsers. Also added a compatibility test.
Thank you for working on this @anmonteiro!
Let's revive this PR.
This diff reworks the current parser based on regexes in favor of one that uses Angstrom.
It also removes the dependency on
ocaml-re
, moving the regular expressions to a new packageuri-re
.The current state of this work aready passes the unit tests, but it might need to further work to get it into a mergeable state. I'm opening a PR to get any and all feedback. Thanks in advance!
This happens to fix https://github.com/mirage/ocaml-uri/issues/21.