sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.3k stars 303 forks source link

Inequality fails with None and datetime #1471

Closed amontanez24 closed 1 year ago

amontanez24 commented 1 year ago

Environment Details

Please indicate the following details about the environment in which you found the bug:

Error Description

Inequality doesn't work with datetime that has None values

Note that this works if (a) I don't add the constraint or (b) I keep the constraint and I convert the None to pd.NaT

InvalidDataError: The provided data does not match the metadata: 'NoneType' object has no attribute 'to_datetime64'

Steps to reproduce

import pandas as pd
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer

data = pd.DataFrame(data={
    'A': [None, None, '2020-01-02', '2020-03-04']*2,
    'B': [None, '2021-03-04', '2021-12-31', None]*2
})

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'A': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' },
        'B': { 'sdtype': 'datetime', 'datetime_format': '%Y-%m-%d' }
    }
})

metadata.validate()
synth = GaussianCopulaSynthesizer(metadata)
synth.validate(data)

synth.add_constraints([{
    'constraint_class': 'Inequality',
    'constraint_parameters': {
        'high_column_name': 'A', 
        'low_column_name': 'B'
    }
}])

synth.fit(data)

InvalidDataError: The provided data does not match the metadata:
'NoneType' object has no attribute 'to_datetime64'
amontanez24 commented 1 year ago

This can probably be resolved by updating this function in the same way as SDV-Enterprise https://github.com/sdv-dev/SDV/blob/766d7fc7acb2a10a160ccf8563b1e219d0c83821/sdv/constraints/utils.py#L10-L31