openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
405 stars 133 forks source link

Bug: AttributeError: 'MultiIndex' object has no attribute '_data' #640

Open DRMPN opened 1 month ago

DRMPN commented 1 month ago

Hello.

Thank you for your work!

I'm using reports/old/reports.ipynb file. It produced all tables and some plots as intended. However, running Strip plots cells in Visualizations results in runtime errors:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[37], line 2
      1 if 'binary' in problem_types:
----> 2     fig = draw_score_stripplot('result', 
      3                                results=all_res.sort_values(by=['framework']),
      4                                type_filter='binary', 
      5                                metadata=metadata,
      6                                xlabel=binary_result_label,
      7                                y_sort_by=tasks_sort_by,
      8                                hue_sort_by=frameworks_sort_key,
      9                                title=f"Results ({binary_result_label}) on {results_group} binary classification problems{title_extra}",
     10                                legend_labels=frameworks_labels,
     11                               );
     12     savefig(fig, create_file(output_dir, "visualizations", "binary_result_stripplot.png"))

File c:\Users\nnikitin-user\Desktop\automlbenchmark\amlb_report\visualizations\stripplot.py:72, in draw_score_stripplot(col, results, type_filter, metadata, y_sort_by, hue_sort_by, filename, **kwargs)
     69 hue = 'framework'
     70 hues = sorted(df[hue].unique(), key=hue_sort_by)
---> 72 fig = draw_stripplot(
     73     df,
     74     x=col,
     75     y=df.index,
     76     hue=hue,
     77     # ylabel='Task',
     78     y_labels=task_labels(df.index.unique()),
     79     hue_order=hues,
     80     legend_title="Framework",
     81     **kwargs
     82 )
     83 if filename:
     84     savefig(fig, create_file("graphics", config.results_group, filename))

File c:\Users\nnikitin-user\Desktop\automlbenchmark\amlb_report\visualizations\stripplot.py:27, in draw_stripplot(df, x, y, hue, xscale, xbound, hue_order, xlabel, ylabel, y_labels, title, legend_title, legend_loc, legend_labels, colormap, size)
     24 sb.despine(bottom=True, left=True)
     26 # Show each observation with a scatterplot
---> 27 sb.stripplot(data=df,
     28              x=x, y=y, hue=hue,
     29              hue_order=hue_order,
     30              palette=colormap,
     31              dodge=True, jitter=True,
     32              alpha=.25, zorder=1)
     34 # Show the conditional means
     35 sb.pointplot(data=df,
     36              x=x, y=y, hue=hue,
     37              hue_order=hue_order,
     38              palette=colormap,
     39              dodge=.5, join=False,
     40              markers='d', scale=.75, ci=None)

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\categorical.py:2082, in stripplot(data, x, y, hue, order, hue_order, jitter, dodge, orient, color, palette, size, edgecolor, linewidth, hue_norm, log_scale, native_scale, formatter, legend, ax, **kwargs)
   2074 def stripplot(
   2075     data=None, *, x=None, y=None, hue=None, order=None, hue_order=None,
   2076     jitter=True, dodge=False, orient=None, color=None, palette=None,
   (...)
   2079     ax=None, **kwargs
   2080 ):
-> 2082     p = _CategoricalPlotter(
   2083         data=data,
   2084         variables=dict(x=x, y=y, hue=hue),
   2085         order=order,
   2086         orient=orient,
   2087         color=color,
   2088         legend=legend,
   2089     )
   2091     if ax is None:
   2092         ax = plt.gca()

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\categorical.py:67, in _CategoricalPlotter.__init__(self, data, variables, order, orient, require_numeric, color, legend)
     56 def __init__(
     57     self,
     58     data=None,
   (...)
     64     legend="auto",
     65 ):
---> 67     super().__init__(data=data, variables=variables)
     69     # This method takes care of some bookkeeping that is necessary because the
     70     # original categorical plots (prior to the 2021 refactor) had some rules that
     71     # don't fit exactly into VectorPlotter logic. It may be wise to have a second
   (...)
     76     # default VectorPlotter rules. If we do decide to make orient part of the
     77     # _base variable assignment, we'll want to figure out how to express that.
     78     if self.input_format == "wide" and orient in ["h", "y"]:

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\_base.py:634, in VectorPlotter.__init__(self, data, variables)
    629 # var_ordered is relevant only for categorical axis variables, and may
    630 # be better handled by an internal axis information object that tracks
    631 # such information and is set up by the scale_* methods. The analogous
    632 # information for numeric axes would be information about log scales.
    633 self._var_ordered = {"x": False, "y": False}  # alt., used DefaultDict
--> 634 self.assign_variables(data, variables)
    636 # TODO Lots of tests assume that these are called to initialize the
    637 # mappings to default values on class initialization. I'd prefer to
    638 # move away from that and only have a mapping when explicitly called.
    639 for var in ["hue", "size", "style"]:

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\_base.py:679, in VectorPlotter.assign_variables(self, data, variables)
    674 else:
    675     # When dealing with long-form input, use the newer PlotData
    676     # object (internal but introduced for the objects interface)
    677     # to centralize / standardize data consumption logic.
    678     self.input_format = "long"
--> 679     plot_data = PlotData(data, variables)
    680     frame = plot_data.frame
    681     names = plot_data.names

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\_core\data.py:58, in PlotData.__init__(self, data, variables)
     51 def __init__(
     52     self,
     53     data: DataSource,
     54     variables: dict[str, VariableSpec],
     55 ):
     57     data = handle_data_source(data)
---> 58     frame, names, ids = self._assign_variables(data, variables)
     60     self.frame = frame
     61     self.names = names

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\seaborn\_core\data.py:265, in PlotData._assign_variables(self, data, variables)
    260             ids[key] = id(val)
    262 # Construct a tidy plot DataFrame. This will convert a number of
    263 # types automatically, aligning on index in case of pandas objects
    264 # TODO Note: this fails when variable specs *only* have scalars!
--> 265 frame = pd.DataFrame(plot_data)
    267 return frame, names, ids

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:664, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    658     mgr = self._init_mgr(
    659         data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
    660     )
    662 elif isinstance(data, dict):
    663     # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664     mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
    665 elif isinstance(data, ma.MaskedArray):
    666     import numpy.ma.mrecords as mrecords

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py:482, in dict_to_mgr(data, index, columns, dtype, typ, copy)
    480     columns = Index(keys)
    481     arrays = [com.maybe_iterable_to_list(data[k]) for k in keys]
--> 482     arrays = [arr if not isinstance(arr, Index) else arr._data for arr in arrays]
    484 if copy:
    485     if typ == "block":
    486         # We only need to copy arrays that will not get consolidated, i.e.
    487         #  only EA arrays

File c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\internals\construction.py:482, in (.0)
    480     columns = Index(keys)
    481     arrays = [com.maybe_iterable_to_list(data[k]) for k in keys]
--> 482     arrays = [arr if not isinstance(arr, Index) else arr._data for arr in arrays]
    484 if copy:
    485     if typ == "block":
    486         # We only need to copy arrays that will not get consolidated, i.e.
    487         #  only EA arrays

AttributeError: 'MultiIndex' object has no attribute '_data'
PGijsbers commented 1 month ago

They are in old because they are intended to work with old (version ~1) result files. Newer versions are not completely backwards compatible. When working with new versions, please try the notebooks from https://github.com/pgijsbers/amlb-results for now, those are the ones used for the JMLR paper. Sorry about the confusion.

PGijsbers commented 1 month ago

@Innixma you also have some visualization code, where can people find that?

Innixma commented 1 month ago

@PGijsbers visualization code exists here: https://github.com/Innixma/autogluon-benchmark

Example: https://github.com/Innixma/autogluon-benchmark/blob/master/v1_results/run_eval_tabrepo_v1.py

Running the above code generates the tables and figures shown here (roughly): https://github.com/Innixma/automl-arena

I plan to clean this up and make it more easily available as part of TabRepo 2.0.

DRMPN commented 1 month ago

That looks good, I will try that instead. Thank you both ❤