Open LiFaytheGoblin opened 1 year ago
Thanks for filing @LiFaytheGoblin, we will investigate and report more info here.
For SDV developers: I think it's fine if such a constraint falls back to our reject sampling approach (instead of transform). It's strange that reject sampling is failing though. Perhaps we are doing it too early, before the foreign key is added back in?
Update: Seems like we explicitly do not support any keys (foreign or primary) in constraints at the moment.
I'll turn this into a feature request and update the title to reflect this.
Environment Details
Error Description
I tried to create relational data with 2 tables:
sections
with their id, rank, and amount of elements in a section andelements
with their id, which section they belong to, rank within the section, and type of elementFor
elements
I added a Unique constraint for the combination of the columns section and rank, so that the rank is unique per section.However, now
model.sample()
returns the error:UserWarning: Unique cannot be transformed because columns: ['section'] were not found. Using the reject sampling approach instead. on the line model.fit(data).
I do not receive any new data.
Steps to reproduce
I use the following code:
An extract from elements-test-2.csv:
An extract from sections-test-2.csv:
My metadata is as follows:
The problematic part seems to be
since the error is not thrown when I remove this part.
Explanation
Neha explained: "This is happening because you have a foreign key column involved in the Unique constraint. SDV treats primary/foreign keys in a separate layer so it is no longer “found” when it gets to the constraint stage. "
Workaround
I have found the following workaround:
I duplicated the column that was not found, so that I can use one of the identical columns as a Foreign Key and one for my Unique constraint. SDV still learns that the columns are identical and thus in the end I receive unique ranks per section.
Extract of my new elements table:
My new metadata:
Suggestion