shivashankarrs / classimb_fairness

5 stars 0 forks source link

Parameters of the GeneralLDAMLoss #17

Closed Buted closed 2 years ago

Buted commented 2 years ago

hello, I'm using your method on our dataset. But I don't understand parameters of the GeneralLDAMLoss. Can u help me? First, in https://github.com/shivashankarrs/classimb_fairness/blob/bf11b73bd91c1497acec562c68a22ba4fcc539f1/src/losses.py#L90 I don't know the meaning of mp_num_list. Then, in https://github.com/shivashankarrs/classimb_fairness/blob/bf11b73bd91c1497acec562c68a22ba4fcc539f1/src/losses.py#L163 I don't know the meaning of targetgroup. And I want to know whether or not groups are orthogonal. Also why use NormedLinear instead of nn.Linear for LDAMLoss? Besides, in https://github.com/shivashankarrs/classimb_fairness/blob/bf11b73bd91c1497acec562c68a22ba4fcc539f1/src/losses.py#L19 Why cat Xp and 1-Xp? Sry for lots of questions. Thx for your working.

afshinrahimi commented 2 years ago

Hi,

q1: mp_num_list: similar to cls_num_list which has the number of instances in each class, clsp_num_list and mp_num_list can have the number of instances in each group and the number of instances in each groupxclass combination, respectively. Each has its own coefficient, but through experiments we realised these additional terms did not improve the results.

q2: we have several classes (targets) and several groups (private attributes), their combination makes |targets| x |groups| number of partitions for training instances where targetgroup is their index e.g. an instance belonging to class 0 and private attribute 0 has targetgroup 0 and so on.

q3: why we need normedlinear: refer to the LDAM paper and its code, we have just followed their work. According to the authors: In order to tune the margin more easily, we effectively normalize the logits (the input to the loss function) by normalizing last hidden activation to 2 norm 1, and normalizing the weight vectors of the last fully-connected layer to2 norm 1, following the previous work [Wang et al., 2018a]. Empirically, the non-smoothness of hinge loss may pose difficulties for optimization. The smooth relaxation of the hinge loss is the following cross-entropy loss with enforced margins. We tested without that and the results were not that different. This might be specific to their dataset or have theoretical reasons.

q4: Xp = torch.cat((Xp, 1-Xp), dim=1) This is copied from the LDAM paper code.

If you look at the application of fair_reg:

if self.rho: for tval in range(self.m_list.shape[0]): reg = fair_reg(F.softmax(x[target==tval], dim=1)[:,tval], group[target==tval]) regloss += reg so we're looping over all groups, finding the instances belonging to that group (x is the output of the model). Here Xp is the group of those instances which is either 0 or 1 because we only had two groups (this code won't work for more groups). So Xp is 1 and then 1-Xp will be 0, so in this way we're creating a one-hot encoding of the group.

Buted commented 2 years ago

Thx for your answering. The answer helps me a lot. And one more question: Does it matter if the groups are orthogonal?

afshinrahimi commented 2 years ago

In our experiments all the groups were non-overlapping, we did not experiment with overlapping/orthogonal cases, left it for future work. It'll be interesting to see the interaction of class-imbalance and overlapping groups e.g. a general purpose multi-class multi-orthogonal group. To the best of my knowledge this will be a novel exploration of the topic in NLP.

Buted commented 2 years ago

Thx for your answering and great work. It helps me a lot : )