Problem when shifting labels to zero in binary classification problems with negative labels

huckiyang / Voice2Series-Reprogramming

ICML 21 - Voice2Series: Adversarial Reprogramming Acoustic Models for Time Series Classification

Apache License 2.0

66 stars 10 forks source link

Problem when shifting labels to zero in binary classification problems with negative labels #2

Open DominguesPH opened 1 year ago

DominguesPH commented 1 year ago

Hello! I would like to report a possible bug.

Code: v2s_main.py Lines: 43-45

When we have multi-class problems, such as ECG 5000 where the original labels are [1,2,3,4,5], the mod function applied in lines 43-45 shifts the labels to zero correctly, so we obtain the values [0,1,2,3,4] as labels.

However, when we have binary classification with negative labels, such as ECG 200, where the original labels are [-1,1], the mod function used yields 1 as remainder for both cases, and the label vector becomes an array of ones.

Do you agree?

huckiyang commented 1 year ago

Hello @DominguesPH, thank you for asking. I think in the L30 of v2s_main.py https://github.com/huckiyang/Voice2Series-Reprogramming/blob/main/v2s_main.py#L30

    y_train[y_train == -1] = 0
    y_test[y_test == -1] = 0

We pre-assigned negative labels back to positive values. If you want to use the other dataset containing negative values, please consider this value check. Hope this clarifies your question.

DominguesPH commented 1 year ago

Hello @DominguesPH, thank you for asking. I think in the L30 of v2s_main.py https://github.com/huckiyang/Voice2Series-Reprogramming/blob/main/v2s_main.py#L30
    y_train[y_train == -1] = 0
    y_test[y_test == -1] = 0 
We pre-assigned negative labels back to positive values. If you want to use the other dataset containing negative values, please consider this value check. Hope this clarifies your question.

Yes, Huckiyang, I agree with you! Thank you for the answer! I only mention this because the ECG200 is not the only dataset with negative labels (the dataset 0 - Ford-A also considers negative labels) and the L30-L32 code is conditioned according to dataset 2 (ECG200). I don't know if other datasets you've used for testing also have these labels.

huckiyang commented 1 year ago

Hello @DominguesPH, Yes, in our experimental setup, we did some data cleanup. I just made some modifications to this public code for all setups. https://github.com/huckiyang/Voice2Series-Reprogramming/blob/main/v2s_main.py#L30

y_train = [np.uint32(i) for i in y_train]
y_test = [np.uint32(i) for i in y_test]

And I got ~94% for 20 eps by running python v2s_main.py --dataset 0 --eps 20 --mod 2 --seg 18 without tuning the dropout. Thank you again for this discussion.