sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

Numerical unknowns should not be converted to sdv-pii-???? #2089

Closed lajohn4747 closed 5 hours ago

lajohn4747 commented 1 week ago

resolves #2064 CU-86b0wh849

PII Type adds the prefix sdv-pii. If the sdtype is unknown, the transformers will auto-assign but the reverse transform for a pii sdtype should always have the dtype of object as it will contain a prefix. If a numerical column is detected we will add change the Faker function to use numerify and cap at the max amount of digits.

Adjusted some unit test to avoid test failures due to mocking

sdv-team commented 1 week ago

Task linked: CU-86b0wh849 SDV - HMA sampling crashes when unknown sdtype detected for numerical column #2064

lajohn4747 commented 4 days ago

LGTM! Just checking, it it an issue for HMA only? @lajohn4747, @amontanez24

It appears to be that way after some local testing