Closed ardhendu21 closed 8 months ago
Hi, you can pass an AnalyzerEngine
instance to your PandasAnalysisBuilder
, and use the standard Presidio configuration capabilities to create recognizers and such.
For example:
import pandas as pd
from presidio_structured import StructuredEngine, PandasAnalysisBuilder
from presidio_anonymizer.entities import OperatorConfig
from faker import Faker
from presidio_analyzer import AnalyzerEngine, PatternRecognizer
operators = {
"DEFAULT": OperatorConfig("replace", {"new_value": "<ANONYMIZED>"}),
"PERSON": OperatorConfig("replace", {"new_value": "REDACTED"}),
"EMAIL_ADDRESS": OperatorConfig("custom", {"lambda": lambda x: fake.safe_email()})
}
# input data
sample_df = pd.DataFrame({"title": ["Mr.", "Ms.", "Mrs."],"name": ["Arthur", "David", "William"], "sign": ["Plus", "Minus", "Minus"]})
# define custom PII detection (in this case with a deny-list)
titles_list = [
"Sir",
"Ma'am",
"Madam",
"Mr.",
"Mrs.",
"Ms.",
"Miss",
"Dr.",
"Professor",
]
titles_recognizer = PatternRecognizer(supported_entity="TITLE", deny_list=titles_list)
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(titles_recognizer)
# Presidio structured
pandas_engine = StructuredEngine()
analysis_builder = PandasAnalysisBuilder(analyzer=analyzer)
tabular_analysis = analysis_builder.generate_analysis(sample_df)
anonymized_df = pandas_engine.anonymize(sample_df, tabular_analysis, operators=operators)
print(anonymized_df)
For creating new custom recognizers, and removing the existing, see the tutorial
Closing the issue, feel free to open if you have any additional questions
Hello,
I'm currently working with the Presidio structured package for anonymizing personal information within pandas DataFrames .
However, I'm interested in extending this functionality by adding custom anonymization for entities which are not predefined. Also want to know how to remove the already defined custom entities.
Here is a snippet of my code which i followed from github.
can someone help me with this?