This issue is a proposal that we (1) load datasets as pandas by default and (2) rewrite the dataset loader to be pandas by default and convert to numpy if the user requests a numpy array.
The reasons for this proposal are:
pandas is much more stable as it used to be a few years ago when we started this project and can now also properly handle strings (see #1107).
pandas can properly encode categorical columns, which can make it easier for projects building on OpenML-Python to handle these categories.
We will use parquet in the background to store files anyway, which has to be interfaced with pandas.
This issue is a proposal that we (1) load datasets as pandas by default and (2) rewrite the dataset loader to be pandas by default and convert to numpy if the user requests a numpy array.
The reasons for this proposal are: