Open cod3licious opened 4 years ago
At the top here are some nice regexs, incl. this one for phone numbers:
r"""
(?:
(?: # (international)
\+?[01]
[ *\-.\)]*
)?
(?: # (area code)
[\(]?
\d{3}
[ *\-.\)]*
)?
\d{3} # exchange
[ *\-.\)]*
\d{4} # base
)"""
maybe this fixes it?
ok, I think this might work:
r"(?:^|(?<=[^\w)]))(((\+?[01])|(\+\d{2}))[ .-]?)?(\(?\d{3}\)?[ .-]?)?(\d{3}[ .-]?\d{4})(\s?(?:ext\.?|[#x-])\s?\d{2,6})?(?:$|(?=\W))"
phone_numbers = [
"2404 9099130",
"024049099130",
"02404 9099130",
"02404/9099130",
"+492404 9099130",
"+4924049099130",
"+492404/9099130",
"0160 123456789",
"0160/123456789",
"+32160 123456789",
"Tel.: 0160 123456789"
]
for i, number in enumerate(phone_numbers):
print(f"{i}: {text_cleaner.transform(number)}")
0: 2404 <phone>
1: 024049099130
2: 02404 <phone>
3: 02404/<phone>
4: +492404 <phone>
5: +4924049099130
6: +492404/<phone>
7: 0160 123456789
8: 0160/123456789
9: +32160 123456789
10: tel.: 0160 123456789
:(
Thanks @cod3licious for providing the regex and thanks @AssassinTee for the test cases. I adapted the regex to make it work with all the provided phone numbers.
The regex doesn't work with phone numbers like
001-504-724-7835x2050
001-687-915-1144
001-507-783-9793x4107
this:
+1 123 1548690
is correctly identified as a phone number, but not this:+49 123 1548690