The regular expression that ebdata.nlp.addresses uses to find addresses is actually a 100-line regex into which a 130-line regex is inserted 11 times. The final regex is over 1500 lines long.
It is almost impossible to debug, fix, or extend this regex.
We need to re-think the address extraction approach completely.
Investigating whether there is existing natural-language work we can leverage.
The regular expression that ebdata.nlp.addresses uses to find addresses is actually a 100-line regex into which a 130-line regex is inserted 11 times. The final regex is over 1500 lines long.
It is almost impossible to debug, fix, or extend this regex. We need to re-think the address extraction approach completely. Investigating whether there is existing natural-language work we can leverage.