unbalanced labels in sleep task

ycq091044 / ManyDG

ICLR 2023 paper - ManyDG - Dataset processing and mode codes

https://openreview.net/forum?id=lcSfirnflpW

23 stars 3 forks source link

unbalanced labels in sleep task #5

Closed xuxiran closed 6 months ago

xuxiran commented 7 months ago

What a remarkable job! However, I have some questions regarding the sleep task. Perhaps due to my negligence, I have not found a method to balance the category labels. I noticed that the five categories have the following distribution [113369, 7953, 27593, 5537, 10126]. This causes the model to output all zeros when running the current code. I am looking for a solution to this issue. What steps should I take? Thank you very much!

I downloaded the dataset from "https://www.physionet.org/content/sleep-edfx/1.0.0/" and used the "wget -r -N -c -np https://physionet.org/files/sleep-edfx/1.0.0/".

I have attempted some commonly used methods to address the issue of imbalanced categories, such as adjusting the weights of CrossEntropyLoss and using WeightedRandomSampler with replacement to perform resampling. However, these methods do not seem to solve the problem, as the model quickly converges to a specific class other than the imbalanced ones.

Maybe I made some mistakes?

ycq091044 commented 7 months ago

I used the Sleep Cassette Study portion, which has the following sample size. It works fine for me.

xuxiran commented 7 months ago

Thank you for your timely response!

I have checked the sample size and is the same as you mentioned in the last comment. The sample size ([113369, 7953, 27593, 5537, 10126]) I mentioned yesterday is corresponded to half in the training dataset (train_X += X[:len(X)//2 + 1], train_X_aux += X[-len(X)//2 - 1:]).

Intuitively, without balancing the labels, and with loss1 being the cross-entropy loss, the model is likely to directly predict class 0 rather than learning the features properly, especially when class 0 accounts for nearly 70% of the data.

Maybe I made some mistakes? Thank you very much again.

sonheesoo commented 7 months ago

Hello, owner. I was also inspired by this job. I wanted to reproduce your results, but ended up getting the same results as the author of this issue. Is there a problem with the data? Otherwise, the experiments were run as provided. I'd like to know what the problem is. Thank you.

ycq091044 commented 6 months ago

Hello @xuxiran @sonheesoo, I am so sorry to reply late. The issue of the following result is not because of the imbalanced dataset. The issue is in my STFT code (before releasing the code, I cleaned it up and somehow caused this additional wrap).

Please change the code in model.py #181 from return torch.clip(torch.log(torch.clip(signal, min=1e-8)), min=0) into return torch.log(torch.clip(signal, min=1e-8)) and everything works out automatically.

xuxiran commented 6 months ago

Thank you very much for your codes. Sorry for that I have done other work these days, so I spent so much time to reply. You are right, the code works well now. I hope my reply help other people. I will further study this work.

ycq091044 commented 6 months ago

Great! Let me close this issue then.