Open Gasewtag opened 4 months ago
The vanilla phone numbers recognizer supports a subset of the countries: https://github.com/microsoft/presidio/blob/db8ff8254123a113a0d511a484647734021de612/presidio-analyzer/presidio_analyzer/predefined_recognizers/phone_recognizer.py#L27
Could you please try to add Portugal (if I got the country code right) and check again? Example code:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import PhoneRecognizer
analyzer = AnalyzerEngine()
# Remove default phone recognizer
analyzer.registry.remove_recognizer("PhoneRecognizer")
# Add custom one (which supports numbers starting with +351)
pt_phone_recognizer = PhoneRecognizer(supported_regions=["PT"])
analyzer.registry.add_recognizer(pt_phone_recognizer)
analyzer.analyze("my name is John Doe my phone number is +351000000000", language="en")
# Note that this is still not detected as a phone number because the number is not a valid Portuguese phone number. If I try another phone number, it works:
analyzer.analyze(text="my name is John Doe my phone number is +351210493000", language="en", score_threshold=0.4)
Output:
[type: PERSON, start: 11, end: 19, score: 0.85,
type: PHONE_NUMBER, start: 39, end: 52, score: 0.75]
Describe the bug Analyzer identifies Portuguese phone number as US bank account
To Reproduce Steps to reproduce the behavior:
Execute analyzer with the following text: "my name is John Doe my phone number is +351000000000" (please replace zeros with random digits 0-9)
Execute anonymizer and retrieve the following result:
text: my name is my phone number is
items:
[
{'start': 41, 'end': 57, 'entity_type': 'US_BANK_NUMBER', 'text': '', 'operator': 'replace'},
{'start': 11, 'end': 20, 'entity_type': 'PERSON', 'text': '', 'operator': 'replace'}
]
Expected behavior: my name is my phone number is