sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

Primary key and sequential key cannot be the same #2096

Open npatki opened 5 days ago

npatki commented 5 days ago

Environment Details

Error Description

For sequential data, it should not be possible for the primary key column (or alternate key column) to be the same as any sequential key column. Yet, the metadata object is accepting such a situation as valid.

from sdv.metadata import SingleTableMetadata

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'A': { 'sdtype': 'id' },
        'B': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' },
        'C': { 'sdtype': 'numerical' },
        'D': { 'sdtype': 'categorical' }
    },
    'primary_key': 'A',
    'sequence_key': 'A'
})

metadata.validate()

Expected Behavior

The code above should throw an error because the primary key cannot be the same as sequence key. (Same error should also be thrown if an alternate key is the same as a sequence key.)

The same error should also be thrown when adding these keys programmatically. Eg.

from sdv.metadata import SingleTableMetadata

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'A': { 'sdtype': 'id' },
        'B': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' },
        'C': { 'sdtype': 'numerical' },
        'D': { 'sdtype': 'categorical' }
    },
})

metadata.set_sequence_key('A')
metadata.set_primary_key('A') # this should throw an error