sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.34k stars 310 forks source link

TypeError when fitting synthetizer #1391

Closed rawinan-soma closed 1 year ago

rawinan-soma commented 1 year ago

Environment Details

Please indicate the following details about the environment in which you found the bug:

Error Description

Error when fitting synthetizer model; might be datetime issue TypeError: Argument 'values' has incorrect type (expected numpy.ndarray, got list)

Steps to reproduce

----> 1 synthesizer.fit(data_sample)

File c:\Users\Rawinan\anaconda3\lib\site-packages\sdv\single_table\base.py:456, in BaseSynthesizer.fit(self, data) 454 self._data_processor.reset_sampling() 455 self._random_state_set = False --> 456 processed_data = self._preprocess(data) 457 self.fit_processed_data(processed_data)

File c:\Users\Rawinan\anaconda3\lib\site-packages\sdv\single_table\base.py:402, in BaseSynthesizer._preprocess(self, data) 401 def _preprocess(self, data): --> 402 self.validate(data) 403 self._data_processor.fit(data) 404 return self._data_processor.transform(data)

File c:\Users\Rawinan\anaconda3\lib\site-packages\sdv\single_table\base.py:227, in BaseSynthesizer.validate(self, data) 225 # Every column must satisfy the properties of their sdtypes 226 for column in data: --> 227 errors += self._validate_column(data[column]) 229 if errors: 230 raise InvalidDataError(errors)

File c:\Users\Rawinan\anaconda3\lib\site-packages\sdv\single_table\base.py:179, in BaseSynthesizer._validate_column(self, column) ... 137 guessed_format = guess_datetime_format( 138 first_non_nan_element, dayfirst=dayfirst 139 )

npatki commented 1 year ago

Hi @bearberror thanks for filing this issue. The error is happening in the validate command, which indicates that there is a mismatch between the metadata and the actual data that is being included.

Unfortunately, I'm not able to fully understand what's going on from this stack trace. Is there more, expanded information available? (I see ... printed out towards the end. Perhaps that can be expanded?)

It would also be helpful if you are able to share any relevant parts of your metadata and code. I think the error is occurring for a column with sdtype datetime, so that would be a place to start.

npatki commented 1 year ago

Hi @bearberror were you able to get to the bottom of this error? As it has been over 2 weeks since our last response, I'll close the issue as solved.

If you are still observing this error, feel free to reply and I will reopen the issue to investigate.