sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

Support PARSynthesizer learning sequential patterns in categorical columns #2070

Open srinify opened 2 weeks ago

srinify commented 2 weeks ago

Problem Description

In some sequential datasets, categorical columns might follow a reliable pattern that I want my synthetic data to also follow. One example is event stream data, where a SIGNUP_SUCCESS event is always before the ONBOARDING_v1 event.

If this pattern is very strict (aka a business rule), then eventually this might be addressed by adding support for constraints in PAR: #2044