Wrong number of child rows in some multi-parent scenarios

Environment Details

Please indicate the following details about the environment in which you found the bug:

SDV version: 0.11.0
Python version: 3.8
Operating System: PopOS! 20.04

Error Description

In a multi-parent scenario, the first parent has the correct number of childs, but the following parents may have different number of childs.

Example

In this example we are expecting both parent to have the same number of child rows, however this is not being the case.

import pandas as pd
import sdv

parent_a = pd.DataFrame({
    'parent_id': range(5),
    'value': range(5)
})

parent_b = pd.DataFrame({
    'parent_id': range(5),
    'value': range(5)
})

child = pd.DataFrame({
    'parent_a': range(5),
    'parent_b': range(5),
    'value': range(5)
})

tables = {
    'parent_a': parent_a,
    'parent_b': parent_b,
    'child': child
}

metadata = sdv.Metadata()
metadata.add_table('parent_a', parent_a, primary_key='parent_id')
metadata.add_table('parent_b', parent_b, primary_key='parent_id')
metadata.add_table('child', child)
metadata.add_relationship('parent_a', 'child', 'parent_a')
metadata.add_relationship('parent_b', 'child', 'parent_b')

model = sdv.SDV()
model.fit(metadata, tables)

sampled = model.sample(num_rows=10)

print(len(sampled['child']['parent_a'].unique()))  # this is 10.
print(len(sampled['child']['parent_b'].unique()))  # this is less than 10.

sdv-dev / SDV

Wrong number of child rows in some multi-parent scenarios #535

Environment Details

Error Description

Example