naszilla / tabzilla

Apache License 2.0
114 stars 26 forks source link

issue with dataset "openml__poker-hand__9890" #64

Closed duncanmcelfresh closed 1 year ago

duncanmcelfresh commented 1 year ago

from log file (below). seems to be memory error:

ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
failed to run experiment during attempt 2... (exit code: 255)
trying again in 30 seconds...
Warning: Permanently added 'compute.3401565641500405904' (ECDSA) to the list of known hosts.
ENV_NAME: sklearn
MODEL_NAME: LinearModel
DATASET_NAME: openml__poker-hand__9890
EXPERIMENT_NAME: all-datasets-b
CONFIG_FILE: /home/shared/tabzilla/TabSurvey/tabzilla_experiment_config.yml
no change     /opt/conda/condabin/conda
no change     /opt/conda/bin/conda
no change     /opt/conda/bin/conda-env
no change     /opt/conda/bin/activate
no change     /opt/conda/bin/deactivate
no change     /opt/conda/etc/profile.d/conda.sh
no change     /opt/conda/etc/fish/conf.d/conda.fish
no change     /opt/conda/shell/condabin/Conda.psm1
no change     /opt/conda/shell/condabin/conda-hook.ps1
no change     /opt/conda/lib/python3.7/site-packages/xontrib/conda.xsh
no change     /opt/conda/etc/profile.d/conda.csh
no change     /home/duncan/.bashrc
No action taken.
running experiment with model LinearModel on dataset openml__poker-hand__9890 in env sklearn

ARGS: Namespace(experiment_config='/home/shared/tabzilla/TabSurvey/tabzilla_experiment_config.yml', dataset_dir='./datasets/openml__poker-hand__9890', model_name='Lin
earModel')
EXPERIMENT ARGS: Namespace(experiment_config='/home/shared/tabzilla/TabSurvey/tabzilla_experiment_config.yml', output_dir='./results/', use_gpu=False, gpu_ids=[0], data_parallel=True, n_random_trials=30, hparam_seed=0, n_opt_trials=0, batch_size=128, val_batch_size=256, early_stopping_rounds=20, epochs=500, logging_period=100, experiment_time_limit=36000, trial_time_limit=7200)
evaluating 30 random hyperparameter samples...
A new study created in memory with name: no-name-1060b2be-9b01-4219-99bf-1afa1360df41
ESC[32m[I 2022-11-03 09:53:17,458]ESC[0m A new study created in memory with name: no-name-1060b2be-9b01-4219-99bf-1afa1360df41ESC[0m
/opt/conda/envs/sklearn/lib/python3.10/site-packages/optuna/study/study.py:393: FutureWarning: `n_jobs` argument has been deprecated in v2.7.0. This feature will be removed in v4.0.0. See https://github.com/optuna/optuna/releases/tag/v2.7.0.
  warnings.warn(
/opt/conda/envs/sklearn/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
/opt/conda/envs/sklearn/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Trial 0 failed because of the following error: MemoryError()
Traceback (most recent call last):
  File "/opt/conda/envs/sklearn/lib/python3.10/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_experiment.py", line 163, in __call__
    result.write(result_file_base, compress=False)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 145, in write
    if not is_jsonable(v, cls=NpEncoder):
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 38, in is_jsonable
    json.dumps(x, cls=cls)
  File "/opt/conda/envs/sklearn/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/opt/conda/envs/sklearn/lib/python3.10/json/encoder.py", line 202, in encode
    return ''.join(chunks)
MemoryError
ESC[33m[W 2022-11-03 10:07:15,754]ESC[0m Trial 0 failed because of the following error: MemoryError()ESC[0m
Traceback (most recent call last):
  File "/opt/conda/envs/sklearn/lib/python3.10/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
    value_or_values = func(trial)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_experiment.py", line 163, in __call__
    result.write(result_file_base, compress=False)
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 145, in write
    if not is_jsonable(v, cls=NpEncoder):
  File "/home/shared/tabzilla/TabSurvey/tabzilla_utils.py", line 38, in is_jsonable
    json.dumps(x, cls=cls)
  File "/opt/conda/envs/sklearn/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/opt/conda/envs/sklearn/lib/python3.10/json/encoder.py", line 202, in encode
    return ''.join(chunks)
MemoryError
xmalloc failed to allocate 131072 bytes of memory/home/shared/tabzilla/scripts/run_experiment_on_instance.sh: line 74:  3956 Killed                  python tabzilla_experiment.py --experiment_config ${CONFIG_FILE} --dataset_dir ${DATSET_DIR} --model_name ${MODEL_NAME}
failed to run experiment during attempt 3... (exit code: 137)
too many SSH attempts. giving up and deleting instance.
The following instances will be deleted. Any attached disks configured to be 
auto-deleted will be deleted unless they are attached to any other instances or 
the `--keep-disks` flag is given and specifies them for keeping. Deleting a disk
 is irreversible and any data on the disk will be lost.
 - [all-datasets-b-0-62] in [us-central1-a]

Do you want to continue (Y/n)?  
Deleted [https://www.googleapis.com/compute/v1/projects/research-collab-naszilla/zones/us-central1-a/instances/all-datasets-b-0-62].
duncanmcelfresh commented 1 year ago

closing, no longer relevant