Closed eddiebergman closed 11 months ago
Forcing an update to 0.14.1
doesn't work either due to renamed variables.
OpenML is experiencing some server issues, and while they last openml==0.13.1
will not be able to download new datasets. I applied a patch to the benchmark (#579) that makes the benchmark compatible with newer versions of openml which can download data despite the server issues. This makes openml fall back on loading arff files, which in some cases are different and thus may lead to different results. Be deliberate about whether upgrading or waiting is the right choice for you! If you do decide to upgrade simply update your openml python installation: python -m pip install --upgrade openml
. You should now be able to run the benchmark again on new datasets.
Thanks for working on the fix! I'm currently testing on the merged fix, on both my dev machine and in docker container that I built that's dependent on AMLB. It works fine on my dev machine, but when running in docker container, I encountered a series of git errors after the benchmark run has started, which I didn't see before. Is there anything that might have led to the error?
Running benchmark `AutoGluon:stable` on `small` framework in `local` mode.
Loading frameworks definitions from ['/app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/benchmarks/small.yaml.
fatal: not a git repository (or any of the parent directories): .git
Does the script stop after that? If not, then I think this is expected behaviour (though something we might want to change):
when building the docker image files are copied into the image. The .git
files are not copied into the repository. This means that the version of the benchmark that exists in a container is not a git repository. When the benchmark is running inside docker it runs as if it is doing a local run, so it tries to record any information about the git environment - in this case it can't fine any and reports that. In the end, the git information of the data present on the host machine should be recorded for the experiment.
If this deviates from the behaviour you observe, then it is likely a bug and I would appreciate it if you open a new issue (as it is unlikely to be introduced by the changes for openml-python 0.14.1
compatibility).
Thanks for the quick response. It doesn't stop after that but the dataset download wasn't successful. Specifically it ran fine in the container when I setup the framework only
python3 automlbenchmark/runbenchmark.py AutoGluon:stable -s only
However, it errored out below and continued with dataset download (I did git init
before this run)
python3 automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip
(.venv) root@572ab6cbad95:/app/ag_bench_runs/tabular/ag_bench_zs# python3 automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip
Running benchmark `AutoGluon:stable` on `small` framework in `local` mode.
Loading frameworks definitions from ['/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/benchmarks/small.yaml.
fatal: No such remote 'origin'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
error: malformed object name 'HEAD'
[MONITORING] [local.small.test.vehicle.0.AutoGluon] CPU Utilization: 15.0%
--------------------------------------------------
Starting job local.small.test.vehicle.0.AutoGluon.
Assigning 4 cores (total=8) for new task vehicle.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Memory Usage: 9.5%
Assigning 26594 MB (total=31641 MB) for new vehicle task.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Disk Usage: 89.3%
Running task vehicle on framework AutoGluon with config:
TaskConfig({'framework': 'AutoGluon', 'framework_params': {}, 'framework_version': '0.8.2', 'type': 'classification', 'name': 'vehicle', 'openml_task_id': 53, 'test_server': False, 'fold': 0, 'metric': 'logloss', 'metrics': ['logloss', 'acc', 'balacc'], 'seed': 407703350, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 4, 'max_mem_size_mb': 26594, 'min_vol_size_mb': -1, 'input_dir': '/root/.cache/openml', 'output_dir': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716', 'output_predictions_file': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/predictions.csv', 'tag': None, 'command': 'automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip', 'git_info': {'repo': 'NA', 'branch': 'NA', 'commit': 'NA', 'tags': [], 'status': ['## No commits yet on master', '?? __init__.py', '?? __pycache__/', '?? ag_bench_runs/', '?? autogluon-bench/', '?? aws/', '?? entrypoint.sh', '?? gpu_utilization.sh', '?? setup.sh']}, 'measure_inference_time': False, 'ext': {}, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'type_': 'multiclass', 'output_metadata_file': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/metadata.json'})
File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
Traceback (most recent call last):
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 489, in _cache_compressed_file_from_file
data = pd.read_parquet(data_file)
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 503, in read_parquet
return impl.read(
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 251, in read
result = self.api.parquet.read_table(
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 2926, in read_table
dataset = _ParquetDatasetV2(
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 2466, in __init__
[fragment], schema=schema or fragment.physical_schema,
File "pyarrow/_dataset.pyx", line 1004, in pyarrow._dataset.Fragment.physical_schema.__get__
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/benchmark.py", line 578, in run
meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/frameworks/AutoGluon/__init__.py", line 16, in run
return run_autogluon_tabular(dataset, config)
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/frameworks/AutoGluon/__init__.py", line 22, in run_autogluon_tabular
train=dict(path=dataset.train.data_path('parquet')),
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 264, in data_path
return self._get_data(format)
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 278, in _get_data
self.dataset._load_data(fmt)
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 235, in _load_data
train, test = splitter.split()
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/utils/process.py", line 744, in profiler
return fn(*args, **kwargs)
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 415, in split
X = self.ds._load_full_data('dataframe')
File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 240, in _load_full_data
X, *_ = self._oml_dataset.get_data(dataset_format=fmt)
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 704, in get_data
data, categorical, attribute_names = self._load_data()
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 529, in _load_data
return self._cache_compressed_file_from_file(file_to_load)
File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 491, in _cache_compressed_file_from_file
raise Exception(f"File: {data_file}") from e
Exception: File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
Loading metadata from `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/metadata.json`.
Metric scores: { 'acc': nan,
'app_version': 'dev [NA, NA, NA]',
'balacc': nan,
'constraint': 'test',
'duration': nan,
'fold': 0,
'framework': 'AutoGluon',
'id': 'openml.org/t/53',
'info': 'Exception: File: '
'/root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq',
'logloss': nan,
'metric': 'neg_logloss',
'mode': 'local',
'models_count': nan,
'params': '',
'predict_duration': nan,
'result': nan,
'seed': 407703350,
'task': 'vehicle',
'training_duration': nan,
'type': 'multiclass',
'utc': '2023-07-25T18:37:16',
'version': '0.8.2'}
Job `local.small.test.vehicle.0.AutoGluon` executed in 0.021 seconds.
Scores saved to `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/scores/results.csv`.
Scores saved to `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/results.csv`.
All jobs executed in 0.045 seconds.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] CPU Utilization: 14.3%
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Memory Usage: 9.5%
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Disk Usage: 89.3%
Processing results for autogluon.small.test.local.20230725T183716
Summing up scores for current run:
id task fold framework constraint metric duration seed info
openml.org/t/53 vehicle 0 AutoGluon test neg_logloss 0.02 407703350 Exception: File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
I opened a new issue https://github.com/openml/automlbenchmark/issues/580 since I believe it's not related to this one after investigation.
Closing this issue all files should be available again on the server. Feel free to re-open if you find a dataset that wasn't recovered correctly (or, even better, open an issue on https://github.com/openml/openml-data).
I'm getting an issue with 503 errors from the openml servers when using the pinned requirement of
0.13.1
, where it can't download the tasks from openml