openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
409 stars 135 forks source link

MLNet fails on iris #353

Closed PGijsbers closed 3 years ago

PGijsbers commented 3 years ago

I ran python runbenchmark.py MLNet test -t iris -f 0 -m docker -s force and it produced the following failure:

train dataset: /input/org/openml/www/datasets/61/dataset_train_0.csv
test dataset: /input/org/openml/www/datasets/61/dataset_test_0.csv
Running cmd `/bench/frameworks/MLNet/lib/mlnet classification --dataset /input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmplq81msbb --name 0 --verbosity q --log-file-path /tmp/tmplq81msbb/0/log.txt`
Running cmd `/bench/frameworks/MLNet/lib/mlnet predict --task-type classification --model /tmp/tmplq81msbb/0/0.zip --dataset /input/org/openml/www/datasets/61/dataset_test_0.csv > /tmp/tmplq81msbb/0/prediction.txt`
Error tokenizing data. C error: Expected 1 fields in line 3, saw 3
Traceback (most recent call last):
  File "/bench/amlb/benchmark.py", line 511, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/bench/frameworks/MLNet/__init__.py", line 10, in run
    return run(dataset, config)
  File "/bench/frameworks/MLNet/exec.py", line 82, in run
    prediction_df = pd.read_csv(output_prediction_path, dtype={'PredictedLabel': 'object'})
  File "/bench/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/bench/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 468, in _read
    return parser.read(nrows)
  File "/bench/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/bench/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 2061, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 827, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1951, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 3

Upon closer inspection, it looks like the output file produced by MLNet in this case is actually an error message:

"System.ArgumentOutOfRangeException: Column 'label' not found (Parameter 'name')\n",
 '   at Microsoft.ML.DataViewSchema.get_Item(String name)\n',
 '   at Microsoft.ML.ModelBuilder.AutoMLService.ModelInferenceEngine.GetKeyValuesFromColumn(ITransformer model, DataViewSchema schema, String labelColumnName) in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/ModelInferenceEngine.cs:line 202\n',
 '   at Microsoft.ML.CLI.Program.<>c.<<Main>b__3_5>d.MoveNext() in /_/src/mlnet/Program.cs:line 180\n',
 '--- End of stack trace from previous location where exception was thrown ---\n', 
 '   at System.CommandLine.Invocation.CommandHandler.GetExitCodeAsync(Object value, InvocationContext context)\n', 
 '   at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)\n', 
 '   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()\n', 
 '--- End of stack trace from previous location where exception was thrown ---\n', 
 '   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<<UseParseErrorReporting>b__0>d.MoveNext()\n', 
 '--- End of stack trace from previous location where exception was thrown ---\n', 
 '   at Microsoft.ML.CLI.Program.<>c__DisplayClass3_0.<<Main>b__8>d.MoveNext() in /_/src/mlnet/Program.cs:line 250\n', 
 '--- End of stack trace from previous location where exception was thrown ---\n', 
 '   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<<UseExceptionHandler>b__0>d.MoveNext()\n', 
'Check out log file for more information: /root/.mlnet/log.txt\n', 
'Exiting ...\n'

which understandably can not be read as predictions.

I haven't had too much time to look into this issue yet, but since I have a 3-day weekend I figured I open it anyway for later reference and perhaps @LittleLittleCloud could have a look.

note: I'm trying to reproduce it on AWS, but I am running into some installation issues there (work in progress).

PGijsbers commented 3 years ago

I don't seem to be able to debug this issue. It is internal to the tool and searching for similar issues I only found solutions that did not work in this scenario. @LittleLittleCloud I'd appreciate it if you could (get someone to) look into this.

PGijsbers commented 3 years ago

I have (so far) also been unable to fix the AWS issue, on the off chance you are familiar with the issue I'll post the trace:

[  361.111154] cloud-init[1494]: Running cmd `/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt`
[  361.705643] cloud-init[1494]: Unhandled exception. System.ArgumentNullException: Value cannot be null. (Parameter 'path1')
[  361.707696] cloud-init[1494]:    at System.IO.Path.Combine(String path1, String path2)
[  361.709225] cloud-init[1494]:    at Microsoft.ML.CLI.Commands.MLCommand..ctor() in /_/src/mlnet/Commands/MLCommand.cs:line 39
[  361.709358] cloud-init[1494]:    at Microsoft.ML.CLI.Commands.AutoMLCommand..ctor() in /_/src/mlnet/Commands/AutoMLCommand.cs:line 35
[  361.709445] cloud-init[1494]:    at Microsoft.ML.CLI.Commands.ClassificationCommand..ctor() in /_/src/mlnet/Commands/ClassificationCommand.cs:line 25
[  361.709520] cloud-init[1494]:    at Microsoft.ML.CLI.Program.Main(String[] args) in /_/src/mlnet/Program.cs:line 217
[  361.709600] cloud-init[1494]: Aborted (core dumped)
[  361.709767] cloud-init[1494]: Command '/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt' returned non-zero exit status 134.
[  361.709864] cloud-init[1494]: Traceback (most recent call last):
[  361.709938] cloud-init[1494]:   File "/repo/amlb/benchmark.py", line 511, in run
[  361.710003] cloud-init[1494]:     meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
[  361.710072] cloud-init[1494]:   File "/repo/frameworks/MLNet/__init__.py", line 10, in run
[  361.710140] cloud-init[1494]:     return run(dataset, config)
[  361.710214] cloud-init[1494]:   File "/repo/frameworks/MLNet/exec.py", line 64, in run
[  361.710285] cloud-init[1494]:     run_cmd(cmd)
[  361.710362] cloud-init[1494]:   File "/repo/amlb/utils/process.py", line 245, in run_cmd
[  361.710433] cloud-init[1494]:     raise e
[  361.710504] cloud-init[1494]:   File "/repo/amlb/utils/process.py", line 232, in run_cmd
[  361.710576] cloud-init[1494]:     preexec_fn=params.preexec_fn)
[  361.710655] cloud-init[1494]:   File "/repo/amlb/utils/process.py", line 77, in run_subprocess
[  361.710727] cloud-init[1494]:     raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
[  361.710796] cloud-init[1494]: subprocess.CalledProcessError: Command '/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt' returned non-zero exit status 134.
[  361.710907] cloud-init[1494]: Loading metadata from `/s3bucket/output/predictions/iris/0/metadata.json`.
[  361.814558] cloud-init[1494]: Metric scores: { 'acc': nan,
[  361.815679] cloud-init[1494]:   'app_version': 'dev [fix_mlnet, 89e4408]',
[  361.817104] cloud-init[1494]:   'balacc': nan,
[  361.818068] cloud-init[1494]:   'constraint': 'test',
[  361.818190] cloud-init[1494]:   'duration': nan,
[  361.818266] cloud-init[1494]:   'fold': 0,
[  361.818360] cloud-init[1494]:   'framework': 'MLNet',
[  361.818443] cloud-init[1494]:   'id': 'openml.org/t/59',
[  361.818510] cloud-init[1494]:   'info': "CalledProcessError: Command '/repo/frameworks/MLNet/lib/mlnet "
[  361.818582] cloud-init[1494]:           'classification --dataset '
[  361.818654] cloud-init[1494]:           '/s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv '
[  361.818738] cloud-init[1494]:           '--test-dataset /s3bucket/input/org/openml/www/dat…',
[  361.818813] cloud-init[1494]:   'logloss': nan,
[  361.818885] cloud-init[1494]:   'metric': 'neg_logloss',
[  361.818955] cloud-init[1494]:   'mode': 'aws',
[  361.819026] cloud-init[1494]:   'models_count': nan,
[  361.819093] cloud-init[1494]:   'params': '',
[  361.819157] cloud-init[1494]:   'predict_duration': nan,
[  361.819217] cloud-init[1494]:   'result': nan,
[  361.819340] cloud-init[1494]:   'seed': 396520499,
[  361.819411] cloud-init[1494]:   'task': 'iris',
[  361.819476] cloud-init[1494]:   'training_duration': nan,
[  361.819539] cloud-init[1494]:   'type': 'multiclass',
[  361.819600] cloud-init[1494]:   'utc': '2021-07-16T11:25:25',
[  361.819664] cloud-init[1494]:   'version': 'latest'}
LittleLittleCloud commented 3 years ago

@PGijsbers I'll take a look