microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.89k stars 579 forks source link

Presidio Image Redactor for Korean Documents #1441

Closed khawar-islam closed 3 months ago

khawar-islam commented 3 months ago

Dear Presidio Team,

Thanks for your great work. I am looking for presidio image redactor work for Korean language and I have also find that there is Korean model available in SpaCy models. How I can use it for images? Are there any tutorials ? I have images and PDF

https://spacy.io/models/ko

Regards, Khawar

omri374 commented 3 months ago

The image capabilities in Presidio (presidio-image-redactor) build on the text capabilities. You can adapt Presidio to handle text in Korean, and then use this adapted version for image redaction. See examples here: https://github.com/microsoft/presidio/issues/1411 (which is for DICOM images, but works similarly for regular images too)

khawar-islam commented 3 months ago

thanks for you comment. Do you have any example or procedure where u added some new language for instance Japanese? I will adapt the same procedure for Korean.