Data Shapes for Multi-class Classification

Ehssaann commented 1 year ago

Hello, I hope you are doing well.

I have tried to do a multi-class classification from scratch with your transformer code on my own data set. But still, I have not been able to do that yet. I think the problem is with my data shapes. I have 1442 samples, each sample has 51 rows(time-step) and 9 columns as features. Also, for each sample, I have four labels (four classes). I wanted to know how should be the shape and formation of 'all_df', 'labels_df', and 'all_IDs'. right now they have flowing shapes: all_df = (1442*51, 9) labels_df = (1442, 4) all_IDs = (1442,)

With these shapes, the code gives me an index error when it wants to splits the data, and it says 5071 is out of bounds for 1442. Also, I noticed that in line 66 of main.py :

labels = my_data.labels_df.values.flatten() the labels_df would be flattened and I do not get why!!!?? It changes the labels index...

Also, I tried to solve this, by deleting the 'flatten()' method, however, it will raise another error when it comes to calculating the loss for validation before starting to train, and it says that the target for the loss should be a 1D tensor, and it should not be multi-target tensor.

I really appreciate it to help me with this. Regards

Ehssaann commented 1 year ago

I should modify that in total my data set have four labels (four classes) and each sample is assigned to one label. (Each label is one hot-encoded vector like: [1, 0, 0, 0] or [0, 0, 1, 0].

gzerveas commented 1 year ago

Hi, your labels_df should be 1 column, which contains the class indices, not the 1-hot representations. You can find this information both in dataset.py (e.g. look at collate_superv) as well as data.py.

Ehssaann commented 1 year ago

Hi, Thanks for your help. It worked. I appreciate it.

gzerveas / mvts_transformer

Data Shapes for Multi-class Classification #53