The paper vs the code - Githubissues

kainoj commented 3 years ago

Hello, after a deep dive into the code and into the paper I found some discrepancies, I believe. Would you mind addressing the below, please?

1. Attributes

In the paper, 𝑉_a is defined as a set of attributes, eg.: 𝑉_a = {he, she, man, woman}. In the loss in the Eq. (1), all attribute words are iterated, and for each attribute, one obtains its static embedding.

In the code, however, attribute_vector_example(), which I believe implements Eq. (2), averages every male- and female-related attributes:

https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L404-L405

As a result, we obtain only two embeddings attributes_hiddens[attribute{0, 1}] of shape [1, 7, 768] (in case of DistiBERT), whereas Eq(1) and (2) suggests us to have as many embeddings as there are sentences containing each attribute from 𝑉_a.

Is there any other explanation for that other than "faster computation"?

2. Source of the static embeddings of the attributes

i in the L_i or 𝑣_i in the equations (1) and (2) stands for first, last or all layers, as explained in the last paragraph of section 3.

Digging up the code for Eq.(2), the attribute_vector_example(), we find it calls get_hiddens_of_model(), which return all hidden layers. All these layers are further process—the static embeddings are always extracted for all layers, no matter what the debiasing mode is.

I would really appreciate your time. Sincerely, P.J

kanekomasahiro commented 3 years ago

Hi, thank you for your questions.

Is there any other explanation for that other than "faster computation"?

Yes, you are right, it was done for faster computation.

Digging up the code for Eq.(2), the attribute_vector_example(), we find it calls get_hiddens_of_model(), which return all hidden layers. All these layers are further process—the static embeddings are always extracted for all layers, no matter what the debiasing mode is.

attribute_vector_example() output attributes_hiddens is further used in forward, which then filters attributes_hiddens regarding debiasing mode, as shown in this line: https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L497

kainoj commented 3 years ago

Hi, thanks for the answer, that explains a lot!

Also, I noticed that data loaders are capped by a minimum over masculine and feminine attributes subsets: https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L193

https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L199-L200 And later the training loop iterates over the data_distribution and re-creates dataloaders and distributions at the last iteration. I guess that this, combined with the dataloader shuffling, it's a hacky way to subsample the attributes at every epoch. Again, has this choice any other motivation than faster computation? (e.g attribute balancing?)

kanekomasahiro commented 3 years ago

Hi. Yes, it is to subsample, so attributes would be balanced.

kanekomasahiro / context-debias

The paper vs the code #2

1. Attributes

2. Source of the static embeddings of the attributes