sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.3k stars 303 forks source link

Would it be possible to synthesize 3D data? #1504

Closed tamersalama closed 1 year ago

tamersalama commented 1 year ago

Environment details

If you are already running SDV, please indicate the following details about the environment in which you are running it:

Problem description

Assuming 3D data; X axis represent a distance (e.g.: miles on a road), Z represent environmental information (e.g.: temperature, humidity, or similar), and Y is the "time" component. Data needs to be very similar from one time-step to the other, for a milestone on the road, humidity does not jumps between values rather has to be within proximity. Also, one mile can not be significantly different from the following mile.

Here's a 3D plot of an example data:

Screenshot 2023-07-16 at 9 29 09 PM

What I already tried

Taking my first steps with the PARSynthesizer - I'm not sure if it can synthesize as intended above.

I tried treading distance/miles as "IDs" - however I'm not sure if I can constraint values for a single ID between time-steps and for close-by IDs.

npatki commented 1 year ago

Hi @tamersalama, nice to meet you!

The PARSynthesizer is suited for multi-sequence data. For example, if you have multiple instruments that are each measuring distance and environment across different time steps. From looking at your example though, it seems as if you have a single sequence of data? Unfortunately, this type of data is not suitable for the model.

I wonder if you may be able to use a single table model for your purposes instead? Although these models may not create synthetic data in the correct order, you can reorder by the timestamp afterwards.

Let me know if that helps!

npatki commented 1 year ago

Hi @tamersalama, do you still have any questions about PAR and synthesizing data?

I'm closing off the issue since it's been inactive for a few weeks. But please feel free to reply if there are any follow-ups. We can always re-open to continue investigating.