Please indicate the following details about the environment in which you found the bug:
SDMetrics version: 0.10.1
Python version: Any
Operating System: Any
Error Description
When running DiagnosticReport or NewRowSynthesis by itself, we get the following error if a categorical column in the real_data is represented as category or something different than object.
pandas.errors.UndefinedVariableError: name <value> is not defined
Steps to reproduce
In order to reproduce this, we can use category as data type in the real_data, here is a short example in order to reproduce it:
from sdmetrics.demos import load_single_table_demo
from sdmetrics.single_table import NewRowSynthesis
real_data, synthetic_data, metadata = load_single_table_demo()
real_data['gender'] = real_data['gender'].astype('category')
NewRowSynthesis.compute_breakdown(real_data, synthetic_data, metadata)
.................
File ~/.virtualenvs/SDMetrics/lib/python3.8/site-packages/pandas/core/computation/scope.py:246, in Scope.resolve(self, key, is_local)
244 return self.temps[key]
245 except KeyError as err:
--> 246 raise UndefinedVariableError(key, is_local) from err
UndefinedVariableError: name 'F' is not defined
Additional context
This bug occurs because of the following if else statement:
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
When running
DiagnosticReport
orNewRowSynthesis
by itself, we get the following error if a categorical column in thereal_data
is represented ascategory
or something different thanobject
.Steps to reproduce
In order to reproduce this, we can use
category
asdata type
in thereal_data
, here is a short example in order to reproduce it:Additional context
This bug occurs because of the following
if else
statement:https://github.com/sdv-dev/SDMetrics/blob/585290fc829db32645c1231d5b0385b9e90a0a4c/sdmetrics/single_table/new_row_synthesis.py#L120-L123
In order to fix this we have to accurately detect the data type and use the proper representation of the object.