Closed Sanchita333 closed 5 months ago
Thanks for filing! I can replicate this on the latest SDMetrics version 0.9.0
and investigated further.
It appears this issue only occurs when you have 2 or more context columns in the dataset. Until we fix this, you will have to remove additional context columns from the real data, synthetic data and metadata. (As shown below.)
import copy
# Remove the 'MarketCap' and 'Sector' context columns
# Only remain context column will be 'Industry'
# Remove from metadata
metadata_copy = copy.deepcopy(metadata.to_dict())
del metadata_copy['fields']['MarketCap']
del metadata_copy['fields']['Sector']
# Remove from real and synthetic data
real_copy = real_data.drop(['MarketCap', 'Sector'], axis=1)
synthetic_copy = synthetic_data.drop(['MarketCap', 'Sector'], axis=1)
LSTMDetection.compute(
real_data=real_copy,
synthetic_data=synthetic_copy,
metadata=metadata_copy
)
BTW @Sanchita333 I notice your example is using the SDV demo dataset. I am curious if you are planning to apply this to your own (private) dataset. If so, does this dataset have multiple context columns?
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
metadata1={'fields': {'Date': {'type': 'datetime'}, 'Symbol': {'type': 'id'}, 'Open': {'type': 'numerical'}, 'Close': {'type': 'numerical'}, 'Volume': {'type': 'numerical'}, 'Sector': {'type': 'categorical'}, 'Industry': {'type': 'categorical'}, 'MarketCap': {'type': 'numerical'}}, 'entity_columns': 'Symbol', 'sequence_index': 'Date', 'context_columns': [ "MarketCap", "Sector", "Industry" ]} I am using the same example that was mentioned in Time series data generation using PAR models https://sdv.dev/SDV/user_guides/timeseries/par.html.... I am am unable to evaluate the synthetic data generated using LSTM detection.
Steps to reproduce