kathrinse / TabSurvey

Experiments on Tabular Data Models
MIT License
265 stars 60 forks source link

KeyError: 'HIGGS' #6

Closed parsifal9 closed 1 year ago

parsifal9 commented 1 year ago

Hi kathrinse,

I get the following error for the Higgs data set for all the models that I have tried. The same models work perfectly well on the other included data sets.

> python train.py --config config/higgs.yml --model_name RandomForest --use_gpu

Traceback (most recent call last):
  File "TabSurvey/train.py", line 154, in <module>
    main_once(arguments)
  File "TabSurvey/train.py", line 133, in main_once
    print(args.parameters[args.dataset])
KeyError: 'HIGGS'

The only change to the code I have made is to download HIGGS.csv.gz and access it locally, i.e. in load_data.py

 path = "/scratch1/dun280/TabSurvey/data/HIGGS.csv.gz"

The error is particularly concerning as I get the same error when I have tried to add my own data sets. I have added a config file and a section in load_data.py but it falls over with a KeyError.

Bye R

kathrinse commented 1 year ago

Hey R!

When you call the train.py like this, it tries to load the best hyperparameters for the given dataset-model combination from the config/best_params.yml file. Thanks for reminding me, that I forgot to add the best hyperparameters for the HIGGS dataset.

So, if you know which hyperparameters you want to use, add them to this file (or to your own file and set the --best_params_file option with the corresponding path).

Or, if you want to start the hyperparameter search, add the --optimize_hyperparameters flag. Then it should work :)

Greetings, Kathrin

parsifal9 commented 1 year ago

Thanks Kathrin, yes, that fixes the problem. Bye