sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
210 stars 45 forks source link

Got ValueError on NewRowSynthesis #305

Closed rawinan-soma closed 1 year ago

rawinan-soma commented 1 year ago

Environment details

Problem description


ValueError Traceback (most recent call last) /var/folders/2l/lfnt8r350xz54773zh760tqh0000gn/T/ipykernel_2254/3186979206.py in 1 from sdmetrics.single_table import NewRowSynthesis 2 ----> 3 NewRowSynthesis.compute(real_data= data_sample, 4 synthetic_data= new_data_sample, 5 metadata= metadata)

/opt/anaconda3/lib/python3.9/site-packages/sdmetrics/single_table/new_row_synthesis.py in compute(cls, real_data, synthetic_data, metadata, numerical_match_tolerance, synthetic_sample_size) 156 The new row synthesis score. 157 """ --> 158 return cls.compute_breakdown( 159 real_data, 160 synthetic_data,

/opt/anaconda3/lib/python3.9/site-packages/sdmetrics/single_table/new_row_synthesis.py in compute_breakdown(cls, real_data, synthetic_data, metadata, numerical_match_tolerance, synthetic_sample_size) 110 111 try: --> 112 matches = real_data.query(' and '.join(row_filter)) 113 except TypeError: 114 if len(real_data) > 10000:

/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in query(self, expr, inplace, kwargs) 4469 kwargs["level"] = kwargs.pop("level", 0) + 2 4470 kwargs["target"] = None -> 4471 res = self.eval(expr, kwargs) 4472 4473 try:

/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in eval(self, expr, inplace, kwargs) 4607 kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers 4608 -> 4609 return _eval(expr, inplace=inplace, kwargs) 4610 4611 def select_dtypes(self, include=None, exclude=None) -> DataFrame:

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace) 356 eng = ENGINES[engine] 357 eng_inst = eng(parsed_expr) --> 358 ret = eng_inst.evaluate() 359 360 if parsed_expr.assigner is None:

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in evaluate(self) 79 80 # make sure no names in resolvers and locals/globals clash ---> 81 res = self._evaluate() 82 return reconstruct_object( 83 self.result_type, res, self.aligned_axes, self.expr.terms.return_type

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in _evaluate(self) 120 scope = env.full_scope 121 _check_ne_builtin_clash(self.expr) --> 122 return ne.evaluate(s, local_dict=scope) 123 124

/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, *kwargs) 833 _numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs) 834 with evaluate_lock: --> 835 return compiled_ex(arguments, **kwargs) 836 837

ValueError: too many inputs

from sdmetrics.single_table import NewRowSynthesis

NewRowSynthesis.compute(real_data= data_sample, synthetic_data= new_data_sample, metadata= metadata)

npatki commented 1 year ago

Hi @bearberror, thanks for filing this question and including the detailed stack trace. I am able to replicate the problem. Let's use bug #307 to track a fix for this. I have provided a workaround that you can use in the meantime.

I'll mark this issue as a duplicate of #307, so feel free to reply there if you have any more feedback about this.