hitsz-ids / synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.
Apache License 2.0
3.27k stars 545 forks source link

Feature: Add Email Generator (a new type of sdgx.data_processor) #184

Closed MooooCat closed 4 months ago

MooooCat commented 5 months ago

Description

This module is a subclass of PIIGenerator, a class designed to handle the conversion and reversal of personally identifiable information (PII) in a DataFrame.

The EmailGenerator class has three important methods: fit, convert, and reverse_convert:

Motivation and Context

The motivation is to provide a way to handle email addresses in a DataFrame.

This is particularly useful when dealing with datasets that contain sensitive information, such as email addresses, and need to be anonymized or de-identified.

How has this been tested?

Email Generator has been tested using a variety of test cases.

These tests include checking if the fit method correctly identifies the email columns in the DataFrame, and if the convert and reverse_convert methods correctly handle the email columns.

Types of changes

Checklist: