guillermo-navas-palencia / optbinning

Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
http://gnpalencia.org/optbinning/
Apache License 2.0
459 stars 100 forks source link

Trouble with serializing binning table to JSON #317

Open tomcortenRQ opened 6 months ago

tomcortenRQ commented 6 months ago

Hello,

I am trying to use the to_json functionality of optbinning.

Whilst trying out the function, I came across the following error message

TypeError: Object of type ndarray is not JSON serializable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File <command-4158204396917011>, line 26
      4 binning_table = binning_process.get_binned_variable(col).binning_table
      5 # if not isinstance(binning_table.categories, list):
      6 #     binning_table.categories = binning_table.categories.tolist()
      7 #     if isinstance(binning_table.categories[0], np.ndarray):
   (...)
     24 # if isinstance(binning_table.max_x, np.int64):
     25 #     binning_table.max_x = float(binning_table.max_x)
---> 26 series = binning_process.get_binned_variable(col).to_json(f'/Workspace/Repos/tom.corten@bridgefund.nl/credit-risk-model/credit_risk_model/src/binning_results/{col}.json')

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-4c8d073b-82ec-44d5-98d1-ef0b0cf814b8/lib/python3.10/site-packages/optbinning/binning/binning.py:1214, in OptimalBinning.to_json(self, path)
   1211 opt_bin_dict['user_splits'] = table.user_splits
   1213 with open(path, "w") as write_file:
-> 1214     json.dump(opt_bin_dict, write_file)

File /usr/lib/python3.10/json/__init__.py:179, in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    173     iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    174         check_circular=check_circular, allow_nan=allow_nan, indent=indent,
    175         separators=separators,
    176         default=default, sort_keys=sort_keys, **kw).iterencode(obj)
    177 # could accelerate with writelines in some versions of Python, at
    178 # a debuggability cost
--> 179 for chunk in iterable:
    180     fp.write(chunk)

File /usr/lib/python3.10/json/encoder.py:431, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    429     yield from _iterencode_list(o, _current_indent_level)
    430 elif isinstance(o, dict):
--> 431     yield from _iterencode_dict(o, _current_indent_level)
    432 else:
    433     if markers is not None:

File /usr/lib/python3.10/json/encoder.py:405, in _make_iterencode.<locals>._iterencode_dict(dct, _current_indent_level)
    403         else:
    404             chunks = _iterencode(value, _current_indent_level)
--> 405         yield from chunks
    406 if newline_indent is not None:
    407     _current_indent_level -= 1

File /usr/lib/python3.10/json/encoder.py:438, in _make_iterencode.<locals>._iterencode(o, _current_indent_level)
    436         raise ValueError("Circular reference detected")
    437     markers[markerid] = o
--> 438 o = _default(o)
    439 yield from _iterencode(o, _current_indent_level)
    440 if markers is not None:

File /usr/lib/python3.10/json/encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type ndarray is not JSON serializable

Types from the binning_tables attributes don't seem to be supported for JSON serialization. Supposedly:

Right now, I made a workaround by casting the binning_table attributes to built-in types (e.g. ndarray to list, int64 to int etc...). Am I overseeing something? I could make a PR for this. What do you think?

tomcortenRQ commented 6 months ago

It only seems to be happening for the serialization of categorical variables.