microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.8k stars 573 forks source link

Adding a French Date Recognizer #1431

Open cpetresc opened 3 months ago

cpetresc commented 3 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] Dear Presidio Team at Microsoft,

When I try to use your library, Presidio, on French text using the SpaCy model, I am unable to detect French dates. For example, '3 janvier 2001', 'janvier 2001', or '3 janvier' are not recognized.

Describe the solution you'd like A clear and concise description of what you want to happen. I would like to have the capability to detect French dates with a dedicated French date recognizer

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. I have written a class called FrDateRecognizer that can detect French dates using regex. You can find this class in the file attached to this issue. fr_date_recognizer.zip

Additional context Add any other context or screenshots about the feature request here.

omri374 commented 3 months ago

Hi, have you tried using a French NER model from either spaCy or Huggingface? They usually have good support for dates and there's no need for rule based logic.