Closed jenny-hm-lee closed 4 months ago
Hi @jenny-hm-lee, thanks for the issue and the PR. There is an option to customize the NER model for dicom and any image redaction. Here's an example:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import TransformersNlpEngine, NerModelConfiguration
from presidio_image_redactor import ImageAnalyzerEngine, DicomImagePiiVerifyEngine
model_config = [{"lang_code": "en", "model_name": {
"spacy": "en_core_web_sm", # use a small spaCy model for lemmas, tokens etc.
"transformers": "obi/deid_roberta_i2b2"
}
}]
# Map transformers model labels to Presidio's
model_to_presidio_entity_mapping = dict(
PER="PERSON",
PERSON="PERSON",
LOC= "LOCATION",
LOCATION= "LOCATION",
GPE="LOCATION",
ORG="ORGANIZATION",
ORGANIZATION="ORGANIZATION",
NORP="NRP",
AGE="AGE",
ID="ID",
EMAIL="EMAIL",
PATIENT="PERSON",
STAFF="PERSON",
HOSP="ORGANIZATION",
PATORG="ORGANIZATION",
DATE="DATE_TIME",
TIME="DATE_TIME",
PHONE="PHONE_NUMBER",
HCW="PERSON",
HOSPITAL="ORGANIZATION",
FACILITY="LOCATION",
)
ner_model_configuration = NerModelConfiguration(labels_to_ignore = ["O"],
model_to_presidio_entity_mapping=model_to_presidio_entity_mapping)
nlp_engine = TransformersNlpEngine(models=model_config,
ner_model_configuration=ner_model_configuration)
# Set up the engine, loads the NLP module (spaCy model by default)
# and other PII recognizers
analyzer_engine = AnalyzerEngine(nlp_engine=nlp_engine)
image_analyzer = ImageAnalyzerEngine(analyzer_engine=analyzer_engine)
dicom_engine = DicomImagePiiVerifyEngine(image_analyzer_engine=image_analyzer)
print(f"Loaded NLP Engine: {dicom_engine.image_analyzer_engine.analyzer_engine.nlp_engine.__class__.__name__}")
Closing for now, please re-open if needed, @jenny-hm-lee
@omri374 , thank you the above response. It is helpful.
Is your feature request related to a problem? Please describe. Currently when using DicomImageRedactorEngine, it use the default spacCy model and there is no way to call and pass in a different analyser engine. I would like to use Flair Recognizer on text detected on DICOM images.
Describe the solution you'd like I can create a PR with a proposed solution.
Describe alternatives you've considered I currently don't see an alternative, but feel free to correct me.