The E.164 standards state that phone numbers can be written in a format of +<CountryCode><City/AreaCode><LocalNumber>;ext=<ext>. An example could be +19052223333;ext=555. The current clean_phone() function doesn't recognize such numbers because this rule is not specified in the regex at line 16, clean_phone.py.
To Reproduce
Steps to reproduce the behavior:
from dataprep.clean import clean_phone
import pandas as pd
df = pd.DataFrame({
"phone": ["+19052223333;ext=555"]
})
clean_phone(df, "phone", output_format="e164")
Expected behavior
The correct output should be +12345678901 ext. 1234 where as it doesn't regonize this format and outputs np.NaN.
Screenshots
Desktop (please complete the following information):
OS: macOS Monterey
Browser: Chrome
Platform: Jupyter Notebook
Platform Version: 6.4.8
Python Version: 3.9.9
Dataprep Version: 0.4.2
Additional context
Here's a blog explaining e.164 standards, specifically about how to specify an extension. Link
Describe the bug
The E.164 standards state that phone numbers can be written in a format of
+<CountryCode><City/AreaCode><LocalNumber>;ext=<ext>
. An example could be+19052223333;ext=555
. The currentclean_phone()
function doesn't recognize such numbers because this rule is not specified in the regex at line 16, clean_phone.py.To Reproduce Steps to reproduce the behavior:
Expected behavior The correct output should be
+12345678901 ext. 1234
where as it doesn't regonize this format and outputsnp.NaN
.Screenshots![Screen Shot 2022-03-13 at 23 31 58](https://user-images.githubusercontent.com/31804614/158117484-d9aa736d-7b09-4651-b51f-abab1aa015b9.png)
Desktop (please complete the following information):
Additional context Here's a blog explaining e.164 standards, specifically about how to specify an extension. Link