kanekomasahiro / context-debias

MIT License
25 stars 2 forks source link

The paper vs the code #2

Closed kainoj closed 3 years ago

kainoj commented 3 years ago

Hello, after a deep dive into the code and into the paper I found some discrepancies, I believe. Would you mind addressing the below, please?

1. Attributes

In the paper, š¯‘‰_a is defined as a set of attributes, eg.: š¯‘‰_a = {he, she, man, woman}. In the loss in the Eq. (1), all attribute words are iterated, and for each attribute, one obtains its static embedding.

In the code, however, attribute_vector_example(), which I believe implements Eq. (2), averages every male- and female-related attributes:

https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L404-L405

As a result, we obtain only two embeddings attributes_hiddens[attribute{0, 1}] of shape [1, 7, 768] (in case of DistiBERT), whereas Eq(1) and (2) suggests us to have as many embeddings as there are sentences containing each attribute from š¯‘‰_a.

Is there any other explanation for that other than "faster computation"?

2. Source of the static embeddings of the attributes

i in the L_i or š¯‘£_i in the equations (1) and (2) stands for first, last or all layers, as explained in the last paragraph of section 3.

Digging up the code for Eq.(2), the attribute_vector_example(), we find it calls get_hiddens_of_model(), which return all hidden layers. All these layers are further processā€”the static embeddings are always extracted for all layers, no matter what the debiasing mode is.


I would really appreciate your time. Sincerely, P.J

kanekomasahiro commented 3 years ago

Hi, thank you for your questions.

Is there any other explanation for that other than "faster computation"?

Yes, you are right, it was done for faster computation.

Digging up the code for Eq.(2), the attribute_vector_example(), we find it calls get_hiddens_of_model(), which return all hidden layers. All these layers are further processā€”the static embeddings are always extracted for all layers, no matter what the debiasing mode is.

attribute_vector_example() output attributes_hiddens is further used in forward, which then filters attributes_hiddens regarding debiasing mode, as shown in this line: https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L497

kainoj commented 3 years ago

Hi, thanks for the answer, that explains a lot!


Also, I noticed that data loaders are capped by a minimum over masculine and feminine attributes subsets: https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L193

https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L199-L200 And later the training loop iterates over the data_distribution and re-creates dataloaders and distributions at the last iteration. I guess that this, combined with the dataloader shuffling, it's a hacky way to subsample the attributes at every epoch. Again, has this choice any other motivation than faster computation? (e.g attribute balancing?)

kanekomasahiro commented 3 years ago

Hi. Yes, it is to subsample, so attributes would be balanced.