openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
391 stars 130 forks source link

503 Error when downloading dataset #573

Closed eddiebergman closed 11 months ago

eddiebergman commented 11 months ago

I'm getting an issue with 503 errors from the openml servers when using the pinned requirement of 0.13.1, where it can't download the tasks from openml

eddiebergman commented 11 months ago

Forcing an update to 0.14.1 doesn't work either due to renamed variables.

PGijsbers commented 11 months ago

OpenML is experiencing some server issues, and while they last openml==0.13.1 will not be able to download new datasets. I applied a patch to the benchmark (#579) that makes the benchmark compatible with newer versions of openml which can download data despite the server issues. This makes openml fall back on loading arff files, which in some cases are different and thus may lead to different results. Be deliberate about whether upgrading or waiting is the right choice for you! If you do decide to upgrade simply update your openml python installation: python -m pip install --upgrade openml. You should now be able to run the benchmark again on new datasets.

suzhoum commented 11 months ago

Thanks for working on the fix! I'm currently testing on the merged fix, on both my dev machine and in docker container that I built that's dependent on AMLB. It works fine on my dev machine, but when running in docker container, I encountered a series of git errors after the benchmark run has started, which I didn't see before. Is there anything that might have led to the error?

Running benchmark `AutoGluon:stable` on `small` framework in `local` mode.
Loading frameworks definitions from ['/app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /app/ag_bench_runs/tabular/ag_bench_test/automlbenchmark/resources/benchmarks/small.yaml.
fatal: not a git repository (or any of the parent directories): .git
PGijsbers commented 11 months ago

Does the script stop after that? If not, then I think this is expected behaviour (though something we might want to change): when building the docker image files are copied into the image. The .git files are not copied into the repository. This means that the version of the benchmark that exists in a container is not a git repository. When the benchmark is running inside docker it runs as if it is doing a local run, so it tries to record any information about the git environment - in this case it can't fine any and reports that. In the end, the git information of the data present on the host machine should be recorded for the experiment. If this deviates from the behaviour you observe, then it is likely a bug and I would appreciate it if you open a new issue (as it is unlikely to be introduced by the changes for openml-python 0.14.1 compatibility).

suzhoum commented 11 months ago

Thanks for the quick response. It doesn't stop after that but the dataset download wasn't successful. Specifically it ran fine in the container when I setup the framework only

python3 automlbenchmark/runbenchmark.py AutoGluon:stable -s only

However, it errored out below and continued with dataset download (I did git init before this run)

python3 automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip
(.venv) root@572ab6cbad95:/app/ag_bench_runs/tabular/ag_bench_zs# python3 automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip
Running benchmark `AutoGluon:stable` on `small` framework in `local` mode.
Loading frameworks definitions from ['/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/resources/benchmarks/small.yaml.
fatal: No such remote 'origin'

fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

error: malformed object name 'HEAD'

[MONITORING] [local.small.test.vehicle.0.AutoGluon] CPU Utilization: 15.0%

--------------------------------------------------
Starting job local.small.test.vehicle.0.AutoGluon.
Assigning 4 cores (total=8) for new task vehicle.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Memory Usage: 9.5%
Assigning 26594 MB (total=31641 MB) for new vehicle task.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Disk Usage: 89.3%
Running task vehicle on framework AutoGluon with config:
TaskConfig({'framework': 'AutoGluon', 'framework_params': {}, 'framework_version': '0.8.2', 'type': 'classification', 'name': 'vehicle', 'openml_task_id': 53, 'test_server': False, 'fold': 0, 'metric': 'logloss', 'metrics': ['logloss', 'acc', 'balacc'], 'seed': 407703350, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 4, 'max_mem_size_mb': 26594, 'min_vol_size_mb': -1, 'input_dir': '/root/.cache/openml', 'output_dir': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716', 'output_predictions_file': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/predictions.csv', 'tag': None, 'command': 'automlbenchmark/runbenchmark.py AutoGluon:stable small test -t vehicle -f 0 -s skip', 'git_info': {'repo': 'NA', 'branch': 'NA', 'commit': 'NA', 'tags': [], 'status': ['## No commits yet on master', '?? __init__.py', '?? __pycache__/', '?? ag_bench_runs/', '?? autogluon-bench/', '?? aws/', '?? entrypoint.sh', '?? gpu_utilization.sh', '?? setup.sh']}, 'measure_inference_time': False, 'ext': {}, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'type_': 'multiclass', 'output_metadata_file': '/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/metadata.json'})
File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
Traceback (most recent call last):
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 489, in _cache_compressed_file_from_file
    data = pd.read_parquet(data_file)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 503, in read_parquet
    return impl.read(
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 251, in read
    result = self.api.parquet.read_table(
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 2926, in read_table
    dataset = _ParquetDatasetV2(
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 2466, in __init__
    [fragment], schema=schema or fragment.physical_schema,
  File "pyarrow/_dataset.pyx", line 1004, in pyarrow._dataset.Fragment.physical_schema.__get__
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/benchmark.py", line 578, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/frameworks/AutoGluon/__init__.py", line 16, in run
    return run_autogluon_tabular(dataset, config)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/frameworks/AutoGluon/__init__.py", line 22, in run_autogluon_tabular
    train=dict(path=dataset.train.data_path('parquet')),
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 264, in data_path
    return self._get_data(format)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 278, in _get_data
    self.dataset._load_data(fmt)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 235, in _load_data
    train, test = splitter.split()
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/utils/process.py", line 744, in profiler
    return fn(*args, **kwargs)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 415, in split
    X = self.ds._load_full_data('dataframe')
  File "/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/amlb/datasets/openml.py", line 240, in _load_full_data
    X, *_ = self._oml_dataset.get_data(dataset_format=fmt)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 704, in get_data
    data, categorical, attribute_names = self._load_data()
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 529, in _load_data
    return self._cache_compressed_file_from_file(file_to_load)
  File "/app/ag_bench_runs/tabular/ag_bench_zs/.venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 491, in _cache_compressed_file_from_file
    raise Exception(f"File: {data_file}") from e
Exception: File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
Loading metadata from `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/predictions/vehicle/0/metadata.json`.
Metric scores: { 'acc': nan,
  'app_version': 'dev [NA, NA, NA]',
  'balacc': nan,
  'constraint': 'test',
  'duration': nan,
  'fold': 0,
  'framework': 'AutoGluon',
  'id': 'openml.org/t/53',
  'info': 'Exception: File: '
          '/root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq',
  'logloss': nan,
  'metric': 'neg_logloss',
  'mode': 'local',
  'models_count': nan,
  'params': '',
  'predict_duration': nan,
  'result': nan,
  'seed': 407703350,
  'task': 'vehicle',
  'training_duration': nan,
  'type': 'multiclass',
  'utc': '2023-07-25T18:37:16',
  'version': '0.8.2'}
Job `local.small.test.vehicle.0.AutoGluon` executed in 0.021 seconds.
Scores saved to `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/autogluon.small.test.local.20230725T183716/scores/results.csv`.
Scores saved to `/app/ag_bench_runs/tabular/ag_bench_zs/automlbenchmark/results/results.csv`.
All jobs executed in 0.045 seconds.
[MONITORING] [local.small.test.vehicle.0.AutoGluon] CPU Utilization: 14.3%
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Memory Usage: 9.5%
[MONITORING] [local.small.test.vehicle.0.AutoGluon] Disk Usage: 89.3%
Processing results for autogluon.small.test.local.20230725T183716
Summing up scores for current run:
             id    task  fold framework constraint      metric  duration      seed                                                                          info
openml.org/t/53 vehicle     0 AutoGluon       test neg_logloss      0.02 407703350 Exception: File: /root/.cache/openml/org/openml/www/datasets/54/dataset_54.pq
suzhoum commented 11 months ago

I opened a new issue https://github.com/openml/automlbenchmark/issues/580 since I believe it's not related to this one after investigation.

PGijsbers commented 11 months ago

Closing this issue all files should be available again on the server. Feel free to re-open if you find a dataset that wasn't recovered correctly (or, even better, open an issue on https://github.com/openml/openml-data).