sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.37k stars 316 forks source link

Support for self-referential tables #735

Open halvorot opened 2 years ago

halvorot commented 2 years ago

Problem Description

There is currently no support for self-referencing tables in SDV. This feature would be very useful, as many database schemas contain this behaviour in real-world implementations.

Expected behavior

See image below for an example where and Employee has a reference to another Employee through the "reportsTo" column which contains an employeeNumber, i.e., a self reference within the Employees table.

image

npatki commented 2 years ago

Thanks for filing @halvorot. You'll see updates on this issue when we prioritize it.

BTW does this happen frequently in your schemas? And I assume there are some implicit rules associated with it --

Anything else?

halvorot commented 2 years ago

I have expeienced it multiple times yes. Although it is hard to put a number on it I would say it is in anywhere from 10-50% of schemas, sorry for the inaccurate answer.

Yes there are, good observation. Another implicit one is that: the ID in reportsTo has to exist as an employeeNumber. e.g. if you delete the "boss" then the reportsTo of lower employees has to be set to null or some other employeeNumber that currently exitst.

Another note is that It may be convinient to define constraints between these relationships (e.g. if salary was a column, employee 1 reports to employee 2. constraint: the salary of employee 1 must be smaller than for employee 2). I don't know if this is relevant at this stage, but figured i would mention it.

emmapass commented 2 years ago

I also require self-referential tables. I am currently modeling/running analytics on a social media platform for doulas. I specifically need self-referential tables to model nested comments (i.e. a comment on another comment). These comments include text and date of comment. Additionally, they are linked to a user table, post, and reactions. Here is an image of my model in MySQL.

Screen Shot 2022-06-27 at 10 08 31 AM
yewei369 commented 2 years ago

Self-referenced tables are one of scenarios we also ran into quite often, and it is of huge value to solve this.

megankatsumi commented 1 year ago

I'm also looking for support for this feature! Hierarchical structures are a very common occurrence in my experience as well.