ocaml / ocaml.org

The official OCaml website.
https://ocaml.org
Other
147 stars 298 forks source link

(docs) Cookbook "Validate an Email Address" with re #2518

Closed ggsmith842 closed 2 weeks ago

ggsmith842 commented 3 weeks ago

Validate an email address using a simple email pattern. Examples included show validation with both the Str and Re libraries.

F-Loyer commented 4 days ago

Subdomains don’t seem to be validated.

I guess we shouldn’t use Re to validate an email address but a library like Emile.

ggsmith842 commented 4 days ago

@F-Loyer you could change the regex pattern to account for the special characters. [a-zA-Z0-9.$_!]+@[a-zA-Z0-9-]+.[a-z]{2,3} captures your email on Regex101. I added "-" to the second character set. The example uses a pretty simple pattern but part of why I included the second pattern is to show how you can update the regex pattern and use it without needing to change any other code.

I think using Re is still a valid way to verify email patterns. It would be cool to see how Emile can be used as well! You should add a recipe for that.

F-Loyer commented 4 days ago

With simple regex, we can fix something, but it can still remain broken. Forgetten « - ». Forgotten subdomains, 3 characters maximum top level domain (sponsored tld can have more)…

From http://emailregex.com:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

(Directly from RFC5322). I will try to use it in an ocaml Re library. Note: it is far shorter than a Re directly derived from RFC822. The emailregex page proposes shorter but inaccurate regex.

F-Loyer commented 2 days ago

A bit tricky: Re doesn’t support \xNN escaping. But Ocaml strings does.

Then we should use:

(* RFC5322 regular expression, adapted from http://emailregex.com *)
let validate_email_re =
  Re.Perl.re "(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\\])"
  |> Re.no_case
  |> Re.compile

EDIT: the regex find an email even if prefixed by a garbage. A « ^…..$ » may be better to validate an address.

F-Loyer commented 2 days ago

With let () =, we should use Array.iter, not Array.map.