h2oai / h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform
http://h2o.ai
1.48k stars 1.01k forks source link

[GBM lr_annealing] failed: water.exceptions.H2OIllegalArgumentException: Can only convert jobs producing a single Model or ModelContainer. #167

Open algomaschine opened 1 year ago

algomaschine commented 1 year ago

Dear Developers,

I've got 28 samples of data, the format is exactly the same, numerical and some categorical columns. It's been working continuously (30 min max time per model) and generated various types of models, as you can see there is DeepLearning, GBM, StackedEnsemble variations. All good. Again, each model corresponds to a different data set, but the format is EXACTLY the same. image

Now, at one instance it shows this error. Unfortunately it doesn't tell me more details. Cou7ld you please advise how can I understand what's happening under the hood and perhaps fix it? image

The interesting thing it happens after the model generation is done, just before it's to finish.

AutoML progress: |█████████████████████████████████████████████████████████████▌ | 97% 08:19:54.521: GBM_lr_annealing_selection_AutoML_26_20221108_75037 [GBM lr_annealing] failed: water.exceptions.H2OIllegalArgumentException: Can only convert jobs producing a single Model or ModelContainer.

AutoML progress: |███████████████████████████████████████████████████████████████ (done)| 100% generated file C:\Users\Administrator\Desktop\snp ephem\h2o models\StackedEnsemble_BestOfFamily_6_AutoML_26_20221108_75037

The interesting part is, it actually did finish the model file eventually. But what's the difference? What is missing, if anything? generated file C:\Users\Administrator\Desktop\h2o models\StackedEnsemble_BestOfFamily_6_AutoML_26_20221108_75037

Thank you!

PS: Also I've noticed one bug. If the previous instance of H2O server is not killed, and I start generating models from a different directory, then this instance of H2O server is reused and the models will be generated in the directory as associated with previous H2O instance, not the new directory from where the program was started.

algomaschine commented 1 year ago

I also get an 'array out of bounds exception' sometimes, but it continues and eventually generates a model image

algomaschine commented 1 year ago

**AutoML progress: |████ | 6% 10:00:16.97: GLM_1_AutoML_4_20221109_95959 [GLM def_1] failed: java.lang.ArrayIndexOutOfBoundsException: 324

AutoML progress: |██████████████████████████████████████████████████████████████▎| 98% 10:29:38.439: GBM_lr_annealing_selection_AutoML_4_20221109_95959 [GBM lr_annealing] failed: water.exceptions.H2OIllegalArgumentException: Can only convert jobs producing a single Model or ModelContainer.

AutoML progress: |███████████████████████████████████████████████████████████████ (done)| 100% Traceback (most recent call last): File ".\auto_model_trainer.py", line 298, in train_bydata(os.path.dirname(sys.argv[0])+"\train-test\","trainper_2022-11-.csv",os.path.dirname(sys.argv[0])+"\h2o models\", "AutoMLper_2022-11-*") File ".\auto_model_trainer.py", line 110, in train_by_data aml.train(y = y, training_frame = train, leaderboard_frame = test) File "C:\Program Files\Python37\lib\site-packages\h2o\automl_estimator.py", line 683, in train self._fetch() File "C:\Program Files\Python37\lib\site-packages\h2o\automl_estimator.py", line 712, in _fetch state = _fetch_state(self.key) File "C:\Program Files\Python37\lib\site-packages\h2o\automl_base.py", line 354, in _fetch_state event_log = _fetch_table(state_json['event_log_table'], key=project_name+"_eventlog", progress_bar=False) File "C:\Program Files\Python37\lib\site-packages\h2o\automl_base.py", line 327, in _fetch_table fr = h2o.H2OFrame(table.cell_values, destination_frame=key, column_names=table.col_header, column_types=table.col_types) File "C:\Program Files\Python37\lib\site-packages\h2o\frame.py", line 114, in init column_names, column_types, na_strings, skipped_columns) File "C:\Program Files\Python37\lib\site-packages\h2o\frame.py", line 155, in _upload_python_object os.remove(tmp_path) # delete the tmp file PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\ADMINI~1\AppData\Local\Temp\2\tmphburp1m8.csv' Closing connection _sid_9f5e at exit H2O session _sid_9f5e closed.**

And I'm starting to get this often, when generating model after model in the same console. These commands below kinda help, but I still have to monitor for every crush. What might be a specific root cause? The resources are OK, enough memory and everything. taskkill /F /IM "python.exe" taskkill /F /IM "java.exe"