safe-refuge / safeway-data

Data mining tools for the Safeway app
4 stars 4 forks source link

Phone numbers sanitization issues #61

Open littlepea opened 2 years ago

littlepea commented 2 years ago

The following values didn't sanitize properly if running main.py with no arguments (spreadsheet data conversion). source -> expected:

  1. (+ 48) 792 568 561, (+48) 22 621 51 65 -> +48792568561
  2. 224813418 -> +48224813418 (should we maybe pass optional country code to sanitise_phone?)
  3. 735174517, 735755200, 731512726 -> +48735174517
  4. 0230 - 564462; 0230 – 564463; Fax: 0230 - 564464 -> +40230564462
  5. Tel.: +40 261 80 77 57, +40 261 80 77 77 interior 20695, 20696, 20697 -> +40261807757
  6. And many more

P.S. in case of multiple numbers we need extra ones appended to the description.

idisblueflash commented 2 years ago

@littlepea

running main.py with no arguments (spreadsheet data conversion)

Could you send me the spreadsheet? Then I can reproduce it locally, or I don't need it at all?

idisblueflash commented 2 years ago

224813418 -> +48224813418 (should we maybe pass optional country code to sanitise_phone?)

No. Since I've already passed it on PolandPhoneNumberExtractorService

  1. 224813418 -> +48224813418
  2. 735174517, 735755200, 731512726 -> +48735174517

Works well for me. I added them into the unit test, then they all passed.

  1. 0230 - 564462; 0230 – 564463; Fax: 0230 - 564464 -> +40230564462
  2. Tel.: +40 261 80 77 57, +40 261 80 77 77 interior 20695, 20696, 20697 -> +40261807757

They're not in Poland(+40 Romania instead), then it'll be a problem when handling with a PolandPhoneNumberExtractorService.

littlepea commented 2 years ago

@littlepea

running main.py with no arguments (spreadsheet data conversion)

Could you send me the spreadsheet? Then I can reproduce it locally, or I don't need it at all?

The spreadsheet ID is hardcoded in the settings I think, so you can just run it

littlepea commented 2 years ago

They're not in Poland(+40 Romania instead), then it'll be a problem when handling with a PolandPhoneNumberExtractorService.

Those services are related to the spiders, but when we convert the spreadsheet they are not used, so it's a different story...

littlepea commented 2 years ago

We do have country name in PoI and we could get the country code from pycountry library. Maybe we should add an optional country: str = None to the function for that use-case?

We can have a call about this if you prefer.