There was a bug in _normalize_string. It used to be:
text = re.sub(find, replace, text, re.UNICODE)
But the syntax for re.sub is:
re.sub(pattern, repl, string, count=0, flags=0)¶
So re.UNICODE was being entered as the count parameter rather than as a flag. This was causing problems when testing on long strings containing many addresses with newlines in between. It was only extracting the first 14 or so addresses.
I find it is best practice to enter flags as a keyword arguments when using regex, as it is easy to forget or mix up the exact syntaxes, so I entered all flags as kwargs.
There was a bug in _normalize_string. It used to be:
text = re.sub(find, replace, text, re.UNICODE)
But the syntax for re.sub is:
re.sub(pattern, repl, string, count=0, flags=0)¶
So re.UNICODE was being entered as the count parameter rather than as a flag. This was causing problems when testing on long strings containing many addresses with newlines in between. It was only extracting the first 14 or so addresses.
I find it is best practice to enter flags as a keyword arguments when using regex, as it is easy to forget or mix up the exact syntaxes, so I entered all flags as kwargs.