kathrinse / TabSurvey

Experiments on Tabular Data Models
MIT License
265 stars 60 forks source link

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object #5

Closed sonnguyen129 closed 1 year ago

sonnguyen129 commented 1 year ago

Hi @kathrinse. When training TabNet with Adult dataset, I cause this error. Here is

Namespace(batch_size=128, cat_dims=[9, 16, 7, 15, 6, 5, 2, 42], cat_idx=[1, 3, 5, 6, 7, 8, 9, 13], config='config/adult.yml', data_parallel=True, dataset='Adult', direction='maximize', early_stopping_rounds=20, epochs=1000, gpu_ids=[0], logging_period=100, model_name='TabNet', n_trials=10, num_classes=1, num_features=14, num_splits=5, objective='binary', one_hot_encode=False, optimize_hyperparameters=True, scale=True, seed=221, shuffle=True, target_encode=True, use_gpu=True, val_batch_size=256)
Start hyperparameter optimization
Loading dataset Adult...
Dataset loaded!
(32561, 14)
Scaling the data...
[I 2022-10-05 19:09:12,970] A new study created in RDB with name: TabNet_Adult
A new study created in RDB with name: TabNet_Adult
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py:75: UserWarning: Device used : cuda
  warnings.warn(f"Device used : {self.device}")
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py:75: UserWarning: Device used : cuda
  warnings.warn(f"Device used : {self.device}")
epoch 0  | loss: 0.68711 | eval_logloss: 1.5043  |  0:00:20s
epoch 1  | loss: 0.39683 | eval_logloss: 0.4977  |  0:00:41s
epoch 2  | loss: 0.38015 | eval_logloss: 0.43938 |  0:01:01s
epoch 3  | loss: 0.36482 | eval_logloss: 0.43644 |  0:01:22s
epoch 4  | loss: 0.34721 | eval_logloss: 0.38523 |  0:01:43s
epoch 5  | loss: 0.34573 | eval_logloss: 0.35584 |  0:02:03s
epoch 6  | loss: 0.34037 | eval_logloss: 0.38542 |  0:02:23s
epoch 7  | loss: 0.33787 | eval_logloss: 0.35565 |  0:02:44s
epoch 8  | loss: 0.32982 | eval_logloss: 0.35525 |  0:03:04s
epoch 9  | loss: 0.32862 | eval_logloss: 0.33821 |  0:03:24s
epoch 10 | loss: 0.32244 | eval_logloss: 0.33319 |  0:03:45s
epoch 11 | loss: 0.32608 | eval_logloss: 0.34302 |  0:04:06s
epoch 12 | loss: 0.3276  | eval_logloss: 0.36721 |  0:04:26s
epoch 13 | loss: 0.32269 | eval_logloss: 0.3386  |  0:04:47s
epoch 14 | loss: 0.32002 | eval_logloss: 0.33012 |  0:05:08s
epoch 15 | loss: 0.31808 | eval_logloss: 0.33689 |  0:05:28s
epoch 16 | loss: 0.31916 | eval_logloss: 0.32849 |  0:05:49s
epoch 17 | loss: 0.31616 | eval_logloss: 0.34039 |  0:06:10s
epoch 18 | loss: 0.31717 | eval_logloss: 0.34637 |  0:06:30s
epoch 19 | loss: 0.31554 | eval_logloss: 0.33508 |  0:06:50s
epoch 20 | loss: 0.318   | eval_logloss: 0.43872 |  0:07:11s
epoch 21 | loss: 0.32983 | eval_logloss: 0.49745 |  0:07:31s
epoch 22 | loss: 0.31808 | eval_logloss: 0.33653 |  0:07:52s
epoch 23 | loss: 0.31731 | eval_logloss: 0.32934 |  0:08:12s
epoch 24 | loss: 0.31352 | eval_logloss: 0.33776 |  0:08:32s
epoch 25 | loss: 0.31438 | eval_logloss: 0.34476 |  0:08:53s
epoch 26 | loss: 0.31483 | eval_logloss: 0.3282  |  0:09:13s
epoch 27 | loss: 0.30911 | eval_logloss: 0.32267 |  0:09:33s
epoch 28 | loss: 0.31008 | eval_logloss: 0.34737 |  0:09:53s
epoch 29 | loss: 0.30756 | eval_logloss: 0.32561 |  0:10:13s
epoch 30 | loss: 0.30834 | eval_logloss: 0.32646 |  0:10:33s
epoch 31 | loss: 0.30615 | eval_logloss: 0.32435 |  0:10:53s
epoch 32 | loss: 0.30466 | eval_logloss: 0.33857 |  0:11:13s
epoch 33 | loss: 0.30495 | eval_logloss: 0.33067 |  0:11:33s
epoch 34 | loss: 0.30485 | eval_logloss: 0.33315 |  0:11:53s
epoch 35 | loss: 0.30466 | eval_logloss: 0.33724 |  0:12:13s
epoch 36 | loss: 0.30336 | eval_logloss: 0.33496 |  0:12:33s
epoch 37 | loss: 0.29928 | eval_logloss: 0.35852 |  0:12:53s
epoch 38 | loss: 0.29941 | eval_logloss: 0.33168 |  0:13:13s
epoch 39 | loss: 0.30065 | eval_logloss: 0.34095 |  0:13:33s
epoch 40 | loss: 0.29873 | eval_logloss: 0.35759 |  0:13:53s
epoch 41 | loss: 0.30008 | eval_logloss: 0.35994 |  0:14:13s
epoch 42 | loss: 0.29637 | eval_logloss: 0.33748 |  0:14:33s
epoch 43 | loss: 0.29404 | eval_logloss: 0.33582 |  0:14:54s
epoch 44 | loss: 0.29512 | eval_logloss: 0.33685 |  0:15:13s
epoch 45 | loss: 0.29254 | eval_logloss: 0.34174 |  0:15:33s
epoch 46 | loss: 0.29284 | eval_logloss: 0.35136 |  0:15:53s
epoch 47 | loss: 0.2898  | eval_logloss: 0.35115 |  0:16:13s

Early stopping occurred at epoch 47 with best_epoch = 27 and best_eval_logloss = 0.32267
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/callbacks.py:172: UserWarning: Best weights from best epoch are automatically used!
  warnings.warn(wrn_msg)
[W 2022-10-05 19:25:28,854] Trial 0 failed because of the following error: TypeError('default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object
Trial 0 failed because of the following error: TypeError('default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object
Traceback (most recent call last):
  File "train.py", line 144, in <module>
    main(arguments)
  File "train.py", line 116, in main
    study.optimize(Objective(args, model_name, X, y), n_trials=args.n_trials)
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/study.py", line 428, in optimize
    show_progress_bar=show_progress_bar,
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 160, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 234, in _run_trial
    raise func_err
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

My config is here

# General parameters
dataset: Adult
model_name: TabNet # LinearModel, KNN, SVM, DecisionTree, RandomForest, XGBoost, CatBoost, LightGBM, ModelTree
                # MLP, TabNet, VIME, TabTransformer, RLN, DNFNet, STG, NAM, DeepFM, SAINT
objective: binary # Don't change
# optimize_hyperparameters: True

# GPU parameters
use_gpu: True
gpu_ids: 0
data_parallel: True

# Optuna parameters - https://optuna.org/
n_trials: 10
direction: maximize

# Cross validation parameters
num_splits: 5
shuffle: True
seed: 221 # Don't change

# Preprocessing parameters
scale: True
target_encode: True
one_hot_encode: False

# Training parameters
batch_size: 128
val_batch_size: 256
early_stopping_rounds: 20
epochs: 1000
logging_period: 100

# About the data
num_classes: 1  # for classification
num_features: 14
cat_idx: [1,3,5,6,7,8,9,13]
# cat_dims: will be automatically set.
cat_dims: [9, 16, 7, 15, 6, 5, 2, 42]

Hope to hear from you soon. Thank you so much

parsifal9 commented 1 year ago

Hi Kathrin, I can confirm that I get the same error

python train.py  --config config/adult.yml --model_name TabNet
.
Namespace(config='config/adult.yml', model_name='TabNet', dataset='Adult', objective='binary', use_gpu=False, 
gpu_ids=[0, 1], data_parallel=True, optimize_hyperparameters=False, n_trials=50, direction='maximize', 
num_splits=5, shuffle=True, seed=221, scale=True, target_encode=True, one_hot_encode=False, 
batch_size=128, val_batch_size=256, early_stopping_rounds=20, epochs=1000, logging_period=100, 
num_features=14, num_classes=1, cat_idx=[1, 3, 5, 6, 7, 8, 9, 13], cat_dims=[9, 16, 7, 15, 6, 5, 2, 42])
.
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

Bye R

kathrinse commented 1 year ago

Hey,

yes, I found out its because of an update of the TabNet implementation.

It can be easily fixed by adding the line X = X.astype(np.float32) in the models/tabnet.py file in the fit-methode (right before self.model.fit(X,...) is called. I updated also the code.

parsifal9 commented 1 year ago

Thanks, yes, it is all working now Bye

sonnguyen129 commented 1 year ago

Hi all, because this problem is solved now. I will close this issue. Thanks all.