yes | python runbenchmark.py autoxgboost:latest example test -m docker -s force
# and
yes | python runbenchmark.py tpot example test -m docker -s force
Both worked and both seem to use the same input data from:
...
Parse with reader=readr : /input/test_data/differentiate_cancer_train.csv
Error in parseHeader(path) :
Invalid column specification line found in ARFF header:
f_1,f_2,f_3,f_4,f_5,f_6,f_7,f_8,f_9,f_10,f_11,f_12,f_13,f_14,f_15,f_16,f_17,f_18,f_19,f_20,f_21,f_22,f_23,f_24,f_25,f_26,f_27,f_28,f_29,f_30,f_31,f_32,f_33,f_34,f_35,f_36,f_37,f_38,f_39,f_40,f_41,f_42,f_43,f_44,f_45,f_46,f_47,f_48,f_49,f_50,...
Yet it's about Weka, but it makes me think if my data need to be converted anyway. Now I'm wondering how can I do it.
BTW, frameworks ranger and mlr3automl failed in the same way.
My input data is a CSV table with heads like f_1,f_2,...,f_4096,target, 4097 cols with 50 rows of floats.
automl_config_docker:
---
#for doc purpose using <placeholder:default_value> syntax when it applies.
#FORMAT: global defaults are defined in config.yaml
- name: __dummy-task
enabled: false # actual default is `true` of course...
openml_task_id: 0
metric: # the first metric in the task list will be optimized against and used for the main result, the other ones are optional and purely informative. Only the metrics annotated with (*) can be used as a performance metric.
- # classification
- acc # (*) accuracy
- auc # (*) array under curve
- logloss # (*) log loss
- f1 # F1 score
- # regression
- mae # (*) mean absolute error
- mse # (*) mean squared error
- rmse # root mean squared error
- rmsle # root mean squared log error
- r2 # R^2 score
folds: 1
max_runtime_seconds: 1200
cores: 1
max_mem_size_mb: -1
ec2_instance_type: m5.large
# local defaults (applying only to tasks defined in this file) can be defined in a task named "__defaults__"
- name: __defaults__
folds: 1
cores: 4
max_runtime_seconds: 400
- name: teddata
dataset:
train: /input/test_data/differentiate_cancer_train.csv
test: /input/test_data/differentiate_cancer_test.csv
type: binary
target: target
folds: 1
and the only change in resources/config.yaml is to use python: 3.8.
Derived from discussion in https://github.com/openml/automlbenchmark/pull/450
I've done with
example test
first:Both worked and both seem to use the same input data from:
My personal input data is only in CSV and I was able to run it successfully against 13 out 19 frameworks.
The usual cmd is:
and it will fail for
autoxgboost
with:More specifically:
Searching around and I found out this https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
Yet it's about
Weka
, but it makes me think if my data need to be converted anyway. Now I'm wondering how can I do it.BTW, frameworks
ranger
andmlr3automl
failed in the same way.My input data is a CSV table with heads like
f_1,f_2,...,f_4096,target
, 4097 cols with 50 rows of floats.automl_config_docker
:and the only change in
resources/config.yaml
is to usepython: 3.8
.