hitsz-ids / synthetic-data-generator

SDG is a specialized framework designed to generate high-quality structured tabular data.
Apache License 2.0
2.97k stars 494 forks source link

How to regulate the range of Synthetic Data #189

Open GlenLeee opened 3 weeks ago

GlenLeee commented 3 weeks ago

❓Search before asking

I have searched for issues similar to this one.

❓Description

I noticed that my original dataset contains only positive values, but the generated data includes negative values. How can I constrain the range of each column in the generated data?

MooooCat commented 3 weeks ago

@GlenLeee Good question!

In response to your request, we here plan to use Rule Manager in version planning (see Issue https://github.com/hitsz-ids/synthetic-data-generator/issues/149). This module is in development and will be released in subsequent versions.

In addition, if possible, can you provide a simple description of your data and which feature is most likely to cause this issue? So that we can further understand the situation from the application scenario side. (If this requirement is common, technically, we can also use metadata and data processor to automatically solve this requirement. This may also be a solution)

Thank you again for your question and look forward to your reply!

GlenLeee commented 3 weeks ago

Is this an auto-reply letter hhhh? In my dataset, each column represents the physical properties of soil. However, some columns have values that are relatively small, within the range of 0-1. The generated data includes negative values, which clearly do not comply with the physical laws. This is quite troubling for me. It would be nice if I could restrict columns in the dataset I want to generate to a certain range. ![Uploading 245eec5292e751ea0916e143837aa1d3.png…]()

MooooCat commented 3 weeks ago

Hahahaha I'm a real person, the avatar is my cat. @GlenLeee

GlenLeee commented 3 weeks ago

Can you see the image I've attached? I've taken a portion of my dataset.

MooooCat commented 3 weeks ago

Can you see the image I've attached? I've taken a portion of my dataset.

I can't see it, all I can see is this ⬇️

Uploading 245eec5292e751ea0916e143837aa1d3.png…

Theoretically, we can upload pictures in the issue. It shows uploading. Is it still uploading?

GlenLeee commented 3 weeks ago

I don't know, I can see the same thing you can see. 245eec5292e751ea0916e143837aa1d3

GlenLeee commented 3 weeks ago

I've posted the screenshot again and as you can see the dataset is all positive, but the data I generated using SDG has negative values, interesting question!

MooooCat commented 3 weeks ago

I've posted the screenshot again and as you can see the dataset is all positive, but the data I generated using SDG has negative values, interesting question!

I can see the picture now. I'll try to fix this soon, lets keep in touch :)

GlenLeee commented 3 weeks ago

I can see the picture now. I'll try to fix this soon, lets keep in touch :)

Thank u sooo much :)