(FT-Transformer) None if X_num is None else X_num[part][idx], IndexError: index 595 is out of bounds for dimension 0 with size 595

GiovanniAVS commented 2 months ago

Hi,

Following your instructions here I managed to split the files into N and y. My file has 9 numeric features and the output has only two classes 0 or 1, that is, it is a classification problem. Doing the suggested division I obtained 2855 data for training, 952 for testing and 952 for validation (I have already counted the removal of the label line). It was very similar to the higgs_small data.

I modified the path of the 'reproduced.toml' file to the new folder containing the data.

The new 'info.json" file was as follows:

{ "name": "teste___0", "basename": "teste", "split": 0, "task_type": "binclass", "n_num_features": 9, "n_cat_features": 0, "train_size": 2855, "val_size": 952, "test_size": 952 }

("teste" is the name of the folder containing the data)

Ok, everything is ready, I ran the command "python bin/tune.py output/teste/ft_transformer/tuning/reproduced.toml"

However, I received the following error: None if X_num is None else X_num[part][idx], IndexError: index 595 is out of bounds for dimension 0 with size 595

The problem is that I have no idea where this 595 came from, since all my data has values much higher than that, I tried to recalculate the data several times, but this same problem persists. Do you happen to know how to help me with this?

error_print

Yura52 commented 2 months ago

Hi,

Unfortunately, it is hard to name the exact reason. I have three quick suggestions:

Remove the cache files for your dataset. They are created directly in the dataset folder and their names start with build_X_... (see this line).
Print the shapes of all tensors in X_num to ensure that they have the shape that you expect.
Separately from the scripts in this repository, read the files of your dataset and check the dimensions.

Let me know if further help is needed

GiovanniAVS commented 2 months ago

Thanks for the quick response, Romoving the cache file solved the problem, but I stumbled upon another one haha. In each epoch, after a little while, the following warning appears "anaconda3/envs/revisiting-models/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no prediction samples.Use zero_division parameter to control this behavior." I think the program is probably indicating everything for just one class, then it can't calculate the metrics for classes that don't have predictions, but I don't understand why. Do you perhaps know if this has to do with the type of data?

Below I'm sending a small sample of the data data_type

Yura52 commented 2 months ago

On our benchmark, the same message appears at the beginning of training on some datasets. As the training continues, the model becomes smarter and the message is gone.

I think this indeed happens because some of the classes are not predicted at all, or the model constantly predicts only one class. In fact, I can imagine that on some datasets some rare classes can be not predicted at all even by the end of training. And even then, it is still possible to compute metrics such as accuracy (and they will be successfully computed). But I think our code computes many metrics, and some of them can trigger the warning.

To sum up, the warning is not a problem as such, but you have to analyse in more detail the predictions of your model to ensure that it produces reasonable predictions for your dataset.

yandex-research / rtdl-revisiting-models

(FT-Transformer) None if X_num is None else X_num[part][idx], IndexError: index 595 is out of bounds for dimension 0 with size 595 #49