Include + in the local part, and disallow _ in the domain part. There are other characters that are allowed in the local part as well, but these are less common (https://en.wikipedia.org/wiki/Email_address).
Optimise the pattern for the case of long contiguous strings with characters from the first character set, but without any @ (or otherwise non-matching).
Currently, the replaceAll(" ") on a string of ~100K characters from the set [-_.0-9A-Za-z] runs in ~1minute on modern hardware; adding a negative lookbehind with one of the characters from that set reduces this to a few milliseconds, and is functionally equivalent. (Consider the current pattern and a match from position i to k. If the character at i-1 is in the character set, there would also be a match from i-1 to k, which would already be replaced.)
There are two changes here:
+
in the local part, and disallow_
in the domain part. There are other characters that are allowed in the local part as well, but these are less common (https://en.wikipedia.org/wiki/Email_address).@
(or otherwise non-matching).Currently, the
replaceAll(" ")
on a string of ~100K characters from the set[-_.0-9A-Za-z]
runs in ~1minute on modern hardware; adding a negative lookbehind with one of the characters from that set reduces this to a few milliseconds, and is functionally equivalent. (Consider the current pattern and a match from positioni
tok
. If the character ati-1
is in the character set, there would also be a match fromi-1
tok
, which would already be replaced.)