sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

SDV 1.14: PAR Synthesizer can't fit if metadata has a `sequence_index` set #2103

Closed srinify closed 2 days ago

srinify commented 3 days ago

Environment Details

Steps to reproduce

The following errors in 1.14 but not in 1.13. Also, uncommenting the code that sets the sequence_index makes PAR be able to fit just fine:

from sdv.sequential import PARSynthesizer
from sdv.metadata import SingleTableMetadata
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='sequential',
    dataset_name='ArticularyWordRecognition')

metadata2 = SingleTableMetadata()
metadata2.detect_from_dataframe(data)

metadata2.update_column("e_id", sdtype="id")
metadata2.set_sequence_key("e_id")
metadata2.set_sequence_index("s_index")

synthesizer = PARSynthesizer(metadata2, 
  verbose=True                        
)

synthesizer.fit(data)

Error Description

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-30-87e16aa7c75b>](https://localhost:8080/#) in <cell line: 21>()
     19 )
     20 
---> 21 synthesizer.fit(data)

3 frames
[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in fit(self, data)
    458         self._data_processor.reset_sampling()
    459         self._random_state_set = False
--> 460         processed_data = self.preprocess(data)
    461         self.fit_processed_data(processed_data)
    462 

[/usr/local/lib/python3.10/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in preprocess(self, data)
    394         is_converted = self._store_and_convert_original_cols(data)
    395 
--> 396         preprocess_data = self._preprocess(data)
    397 
    398         if is_converted:

[/usr/local/lib/python3.10/dist-packages/sdv/sequential/par.py](https://localhost:8080/#) in _preprocess(self, data)
    276         sequence_key_transformers = {sequence_key: None for sequence_key in self._sequence_key}
    277         if not self._data_processor._prepared_for_fitting:
--> 278             self.auto_assign_transformers(data)
    279 
    280         self.update_transformers(sequence_key_transformers)

[/usr/local/lib/python3.10/dist-packages/sdv/sequential/par.py](https://localhost:8080/#) in auto_assign_transformers(self, data)
    257         if self._sequence_index:
    258             sequence_index_transformer = self.get_transformers()[self._sequence_index]
--> 259             if sequence_index_transformer.enforce_min_max_values:
    260                 sequence_index_transformer.enforce_min_max_values = False
    261 

AttributeError: 'NoneType' object has no attribute 'enforce_min_max_values'
npatki commented 3 days ago

Is the sequence index column (s_index) a numerical column? If so, this is a known issue that is described in #2079. It has also already been closed, meaning that the next SDV release should have fixed it.

srinify commented 2 days ago

@npatki yup it's numerical -- I'll close this issue out then. Thanks!