microsoft / presidio

Context aware, pluggable and customizable data protection and de-identification SDK for text and images
https://microsoft.github.io/presidio
MIT License
3.71k stars 565 forks source link

allow DicomImageRedactorEngine to use different AnalyzerEngine #1411

Closed jenny-hm-lee closed 2 months ago

jenny-hm-lee commented 3 months ago

Is your feature request related to a problem? Please describe. Currently when using DicomImageRedactorEngine, it use the default spacCy model and there is no way to call and pass in a different analyser engine. I would like to use Flair Recognizer on text detected on DICOM images.

Describe the solution you'd like I can create a PR with a proposed solution.

Describe alternatives you've considered I currently don't see an alternative, but feel free to correct me.

omri374 commented 3 months ago

Hi @jenny-hm-lee, thanks for the issue and the PR. There is an option to customize the NER model for dicom and any image redaction. Here's an example:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import TransformersNlpEngine, NerModelConfiguration
from presidio_image_redactor import ImageAnalyzerEngine, DicomImagePiiVerifyEngine

model_config = [{"lang_code": "en", "model_name": {
    "spacy": "en_core_web_sm",  # use a small spaCy model for lemmas, tokens etc.
    "transformers": "obi/deid_roberta_i2b2"
    }
}]

# Map transformers model labels to Presidio's
model_to_presidio_entity_mapping = dict(
    PER="PERSON",
    PERSON="PERSON",
    LOC= "LOCATION",
    LOCATION= "LOCATION",
    GPE="LOCATION",
    ORG="ORGANIZATION",
    ORGANIZATION="ORGANIZATION",
    NORP="NRP",
    AGE="AGE",
    ID="ID",
    EMAIL="EMAIL",
    PATIENT="PERSON",
    STAFF="PERSON",
    HOSP="ORGANIZATION",
    PATORG="ORGANIZATION",
    DATE="DATE_TIME",
    TIME="DATE_TIME",
    PHONE="PHONE_NUMBER",
    HCW="PERSON",
    HOSPITAL="ORGANIZATION",
    FACILITY="LOCATION",
)

ner_model_configuration = NerModelConfiguration(labels_to_ignore = ["O"], 
                                                model_to_presidio_entity_mapping=model_to_presidio_entity_mapping)

nlp_engine = TransformersNlpEngine(models=model_config,
                                   ner_model_configuration=ner_model_configuration)

# Set up the engine, loads the NLP module (spaCy model by default) 
# and other PII recognizers
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)

image_analyzer = ImageAnalyzerEngine(analyzer_engine=analyzer_engine)

dicom_engine = DicomImagePiiVerifyEngine(image_analyzer_engine=image_analyzer)

print(f"Loaded NLP Engine: {dicom_engine.image_analyzer_engine.analyzer_engine.nlp_engine.__class__.__name__}")
omri374 commented 2 months ago

Closing for now, please re-open if needed, @jenny-hm-lee

jenny-hm-lee commented 2 months ago

@omri374 , thank you the above response. It is helpful.