sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.3k stars 303 forks source link

Error when using a datetime column as a context column with PAR Synthesizer #2187

Open MichaelG-Uke opened 1 month ago

MichaelG-Uke commented 1 month ago

Environment Details

Please indicate the following details about the environment in which you found the bug:

Error Description

Using datetime objects in a context column results in the following error:

ValueError: Error: Sampling terminated. No results were saved due to unspecified "output_file_path".
could not convert string to float: '2006-01-01'

Steps to reproduce

!pip install sdv==1.15.0

import pandas as pd
import random
from datetime import datetime, timedelta
from sdv.sequential import PARSynthesizer
from sdv.metadata import SingleTableMetadata

event_start_date = datetime(2024, 1, 1)
event_end_date = datetime(2024, 7, 1)
n = 10

start_dates = [(datetime(2023,9,1)).strftime('%Y-%m-%d') for _ in range(n)]
context_dates = [(event_start_date + timedelta(days=random.randint(0, (event_end_date - event_start_date).days))).strftime('%Y-%m-%d') for _ in range(n)]

s_key = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
val = [51, 53, 54, 55, 56, 12, 13, 14, 15, 16]

df = pd.DataFrame(
    {
        "Date": start_dates,
        "s_key": s_key,
        "val": val
    }
)

metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)
metadata.update_column(column_name='s_key', sdtype='id')
metadata.set_sequence_key(column_name="s_key")

synthesizer = PARSynthesizer(metadata, verbose=True, epochs=5,context_columns=["Date"])

event_context = pd.DataFrame(data={
    "Date": context_dates
})

synthesizer.fit(df)
synthesizer.sample_sequential_columns(context_columns=event_context)
srinify commented 1 month ago

Thanks for raising this @MichaelG-Uke I ran into an error during the synthesizer.fit(df) step itself:

Screenshot 2024-08-15 at 11 38 04 AM

Did you run into your error during fit or during sampling?

srinify commented 3 weeks ago

I reproduced the error internally in this Colab Notebook: https://colab.research.google.com/drive/1SW5WxJgU5Y2ykmP0t793a5OE-LxKsw5H?authuser=1#scrollTo=sHSODwrsjwZ9