Closed martinjurkovic closed 6 months ago
The problems are the following:
For the datetime column Date
, the format in the metadata is wrong.
Right now it is %d/%m/%y
but it should be %Y-%m-%d
.
For the Promo2
column, the problem is that in SingleTableMetadata when reading boolean type column the numerical values are not supported.
https://github.com/sdv-dev/SDV/blob/74baae90eb64abf52a5ea3e55b2017ef849fec6d/sdv/metadata/single_table.py#L903-L906
Hi @martinjurkovic thanks for letting us know and filing this issue. We can keep this open and update the issue once we update the S3 bucket.
FYI for a quick way to determine whether the metadata matches the data, you can use the following command:
metadata.validate_data(data)
In the meantime, please feel free to update the invalid columns locally to continue on with this dataset. The following should work:
metadata.update_column(
table_name='store',
column_name='Promo2',
sdtype='categorical'
)
metadata.update_column(
table_name='historical',
column_name='Date',
sdtype='datetime',
datetime_format='%Y-%m-%d'
)
metadata.validate()
metadata.validate_data(data)
BTW there are a few other datasets that are running into issues due to the metadata. See
Hi @martinjurkovic, this issue has now been fixed. No need to update your SDV version -- it should now work if you re-run download_demo
. Let us know if you are still having problems with this. Thanks.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Can't fit HMA for
Rossmann
multitable demo dataset.Error message:
Steps to reproduce