Open Veeresh1996 opened 1 week ago
Hi @Veeresh1996, SDV is designed to ensure that the synthetic data matches (a) the regex format that you provide and (b) the original data type of the real data. In your case, it seems like the two are in conflict with each other: The regex describes having a 6-digit strings, but it appears to me the original data type is an integer.
The regex may correctly produce strings such as "002690"
but when converted to an integer, this will become 2690
(no longer 6 characters). So the regex is not really compatible with the data type. To fix this issue, you would have to address root cause of the mismatch.
[1-9]\d{5}
orHey Neha, Thanks the solution that you have provided works for me.
Environment details
If you are already running SDV, please indicate the following details about the environment in which you are running it:
Problem description
I am using HMAsynthesizer for Multitables. I am able to generate data with the trained model. But for the columns which I have mentioned as ID's the length of the generated values not matches with the real data even though I have specified the regex pattern. For example, One of the ID column contains 6 digits but the generated output contains some random lengths. Real Data ID Value: 300164 Generated value: 2690
What I already tried
This is the metadata for that specific field, "patnum": { "sdtype": "id", "regex_format": "^\d{6}$" } Could you please look into it ASAP? Please let me know if you need any other info
Thanks in advance