Closed PGijsbers closed 3 years ago
I don't seem to be able to debug this issue. It is internal to the tool and searching for similar issues I only found solutions that did not work in this scenario. @LittleLittleCloud I'd appreciate it if you could (get someone to) look into this.
I have (so far) also been unable to fix the AWS issue, on the off chance you are familiar with the issue I'll post the trace:
[ 361.111154] cloud-init[1494]: Running cmd `/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt`
[ 361.705643] cloud-init[1494]: Unhandled exception. System.ArgumentNullException: Value cannot be null. (Parameter 'path1')
[ 361.707696] cloud-init[1494]: at System.IO.Path.Combine(String path1, String path2)
[ 361.709225] cloud-init[1494]: at Microsoft.ML.CLI.Commands.MLCommand..ctor() in /_/src/mlnet/Commands/MLCommand.cs:line 39
[ 361.709358] cloud-init[1494]: at Microsoft.ML.CLI.Commands.AutoMLCommand..ctor() in /_/src/mlnet/Commands/AutoMLCommand.cs:line 35
[ 361.709445] cloud-init[1494]: at Microsoft.ML.CLI.Commands.ClassificationCommand..ctor() in /_/src/mlnet/Commands/ClassificationCommand.cs:line 25
[ 361.709520] cloud-init[1494]: at Microsoft.ML.CLI.Program.Main(String[] args) in /_/src/mlnet/Program.cs:line 217
[ 361.709600] cloud-init[1494]: Aborted (core dumped)
[ 361.709767] cloud-init[1494]: Command '/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt' returned non-zero exit status 134.
[ 361.709864] cloud-init[1494]: Traceback (most recent call last):
[ 361.709938] cloud-init[1494]: File "/repo/amlb/benchmark.py", line 511, in run
[ 361.710003] cloud-init[1494]: meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
[ 361.710072] cloud-init[1494]: File "/repo/frameworks/MLNet/__init__.py", line 10, in run
[ 361.710140] cloud-init[1494]: return run(dataset, config)
[ 361.710214] cloud-init[1494]: File "/repo/frameworks/MLNet/exec.py", line 64, in run
[ 361.710285] cloud-init[1494]: run_cmd(cmd)
[ 361.710362] cloud-init[1494]: File "/repo/amlb/utils/process.py", line 245, in run_cmd
[ 361.710433] cloud-init[1494]: raise e
[ 361.710504] cloud-init[1494]: File "/repo/amlb/utils/process.py", line 232, in run_cmd
[ 361.710576] cloud-init[1494]: preexec_fn=params.preexec_fn)
[ 361.710655] cloud-init[1494]: File "/repo/amlb/utils/process.py", line 77, in run_subprocess
[ 361.710727] cloud-init[1494]: raise subprocess.CalledProcessError(retcode, process.args, output=stdout, stderr=stderr)
[ 361.710796] cloud-init[1494]: subprocess.CalledProcessError: Command '/repo/frameworks/MLNet/lib/mlnet classification --dataset /s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv --test-dataset /s3bucket/input/org/openml/www/datasets/61/dataset_test_0.csv --train-time 60 --label-col 4 --output /tmp/tmpssjbguqy --name 0 --verbosity q --log-file-path /tmp/tmpssjbguqy/0/log.txt' returned non-zero exit status 134.
[ 361.710907] cloud-init[1494]: Loading metadata from `/s3bucket/output/predictions/iris/0/metadata.json`.
[ 361.814558] cloud-init[1494]: Metric scores: { 'acc': nan,
[ 361.815679] cloud-init[1494]: 'app_version': 'dev [fix_mlnet, 89e4408]',
[ 361.817104] cloud-init[1494]: 'balacc': nan,
[ 361.818068] cloud-init[1494]: 'constraint': 'test',
[ 361.818190] cloud-init[1494]: 'duration': nan,
[ 361.818266] cloud-init[1494]: 'fold': 0,
[ 361.818360] cloud-init[1494]: 'framework': 'MLNet',
[ 361.818443] cloud-init[1494]: 'id': 'openml.org/t/59',
[ 361.818510] cloud-init[1494]: 'info': "CalledProcessError: Command '/repo/frameworks/MLNet/lib/mlnet "
[ 361.818582] cloud-init[1494]: 'classification --dataset '
[ 361.818654] cloud-init[1494]: '/s3bucket/input/org/openml/www/datasets/61/dataset_train_0.csv '
[ 361.818738] cloud-init[1494]: '--test-dataset /s3bucket/input/org/openml/www/dat…',
[ 361.818813] cloud-init[1494]: 'logloss': nan,
[ 361.818885] cloud-init[1494]: 'metric': 'neg_logloss',
[ 361.818955] cloud-init[1494]: 'mode': 'aws',
[ 361.819026] cloud-init[1494]: 'models_count': nan,
[ 361.819093] cloud-init[1494]: 'params': '',
[ 361.819157] cloud-init[1494]: 'predict_duration': nan,
[ 361.819217] cloud-init[1494]: 'result': nan,
[ 361.819340] cloud-init[1494]: 'seed': 396520499,
[ 361.819411] cloud-init[1494]: 'task': 'iris',
[ 361.819476] cloud-init[1494]: 'training_duration': nan,
[ 361.819539] cloud-init[1494]: 'type': 'multiclass',
[ 361.819600] cloud-init[1494]: 'utc': '2021-07-16T11:25:25',
[ 361.819664] cloud-init[1494]: 'version': 'latest'}
@PGijsbers I'll take a look
I ran
python runbenchmark.py MLNet test -t iris -f 0 -m docker -s force
and it produced the following failure:Upon closer inspection, it looks like the output file produced by MLNet in this case is actually an error message:
which understandably can not be read as predictions.
I haven't had too much time to look into this issue yet, but since I have a 3-day weekend I figured I open it anyway for later reference and perhaps @LittleLittleCloud could have a look.
note: I'm trying to reproduce it on AWS, but I am running into some installation issues there (work in progress).