Closed pplonski closed 4 years ago
config
from the examples/aws/config.yaml
setting the project repository address in the config to point to my repo.I'm really impressed with aws mode. I like the feature which copy results from aws to my local folder.
Good to hear you managed to overcome the issues :) looking forward to the PR.
It was working all good when I was testing with simple baseline algorithms. When I've added LightGBM
and Xgboost
I started to have strange errors:
ERROR:frameworks.shared.callee:DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: f0, f1, f2, f3
Traceback (most recent call last):
File "/home/piotr/sandbox/automlbenchmark/frameworks/shared/callee.py", line 121, in call_run
result = run_fn(ds, config)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/exec.py", line 66, in run
print(automl.predict(X_train))
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/supervised/automl.py", line 722, in predict
predictions = self._best_model.predict(X)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/supervised/model_framework.py", line 208, in predict
y_p = learner.predict(X_data)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/supervised/algorithms/lightgbm.py", line 71, in predict
return self.model.predict(X)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/lightgbm/basic.py", line 2415, in predict
data_has_header, is_reshape)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/lightgbm/basic.py", line 504, in predict
data = _data_from_pandas(data, None, None, self.pandas_categorical)[0]
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/venv/lib/python3.7/site-packages/lightgbm/basic.py", line 344, in _data_from_pandas
+ ', '.join(data.columns[bad_indices]))
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: f0, f1, f2, f3
62bb0f1c-8545-11ea-a8e1-2c56dc4a8211
{"error_message":"DataFrame.dtypes for data must be int, float or bool.\nDid not expect the data types in the following fields: f0, f1, f2, f3","models_count":0}
DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: f0, f1, f2, f3
Traceback (most recent call last):
File "/home/piotr/sandbox/automlbenchmark/amlb/benchmark.py", line 409, in run
meta_result = framework.run(self._dataset, task_config)
File "/home/piotr/sandbox/automlbenchmark/frameworks/supervised/__init__.py", line 29, in run
input_data=data, dataset=dataset, config=config)
File "/home/piotr/sandbox/automlbenchmark/frameworks/shared/caller.py", line 71, in run_in_venv
raise NoResultError(res.error_message)
amlb.results.NoResultError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in the following fields: f0, f1, f2, f3
Metric scores: { 'acc': nan,
'duration': nan,
'fold': 0,
'framework': 'supervised',
'id': 'openml.org/t/59',
'info': 'NoResultError: DataFrame.dtypes for data must be int, float or '
'bool.\n'
'Did not expect the data types in the following fields: f0, f1, f2, '
'f3',
'logloss': nan,
'mode': 'local',
'models': nan,
'params': '',
'result': nan,
'seed': 2911488517,
'tag': None,
'task': 'iris',
'utc': '2020-04-23T09:33:57',
'version': '0.2.8'}
I get errors when running predict
. The fit method works. Then predict
even if on the X_train
data raise such error. Have you ever expect such errors?
Edit: look like something on my side ...
Hi @pplonski , never seen this error before.
DataFrame.dtypes for data must be int, float or bool
It looks like with lightgbm, your predict
requires categorical columns to be encoded. Which is weird if it was not required for training.
I've found the reason :)
object
(for all columns) if at least one column is not numeric. Even in the case of the Iris dataset, input data was object
because target column is not numeric. (I assume that X and y data is read from one file, and that's why the type).input data was object because target column is not numeric
yes, I noticed this recently too and I'm thinking about "fixing" this for arff files as we know the column types (at least numeric vs categoricals). It's possible that other libraries rely on the ndarray dtypes, although haven't seen any issue until now.
I think, that there might be no issue because most of the frameworks are working on encoded data, which is all numeric.
I created PR https://github.com/openml/automlbenchmark/pull/105 with mljar-supervised.
The mljar-supervised version is 0.2.8. It is still in development. I hope I will soon PR with newer version.
The framework should work in all modes: local, docker and aws.
I would like to add mljar-supervised results from benchmarks to reports or website. Do you have steps how to do it?
I'm working on AutoML python package: mljar-supervised. I would like to add it to
automlbenchmark
.I successfully run through 1-9 points from HOWTO::Add default framework
I stuck at 10 (docker integration) and 11 (aws).
For docker I susspect I had problem with privilages:
For aws I got a lot of loga and then errors:
My code is here: https://github.com/pplonski/automlbenchmark/tree/master/frameworks/supervised
I would ask for some tips, what can be a problem with aws setup? (I think I will handle docker problem myself)