Closed kainoj closed 3 years ago
Hi, thank you for your questions.
Is there any other explanation for that other than "faster computation"?
Yes, you are right, it was done for faster computation.
Digging up the code for Eq.(2), the attribute_vector_example(), we find it calls get_hiddens_of_model(), which return all hidden layers. All these layers are further processā€”the static embeddings are always extracted for all layers, no matter what the debiasing mode is.
attribute_vector_example()
output attributes_hiddens
is further used in forward
, which then filters attributes_hiddens
regarding debiasing mode, as shown in this line:
https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L497
Hi, thanks for the answer, that explains a lot!
Also, I noticed that data loaders are capped by a minimum over masculine and feminine attributes subsets: https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L193
https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L199-L200
And later the training loop iterates over the data_distribution
and re-creates dataloaders and distributions at the last iteration. I guess that this, combined with the dataloader shuffling, it's a hacky way to subsample the attributes at every epoch.
Again, has this choice any other motivation than faster computation? (e.g attribute balancing?)
Hi. Yes, it is to subsample, so attributes would be balanced.
Hello, after a deep dive into the code and into the paper I found some discrepancies, I believe. Would you mind addressing the below, please?
1. Attributes
In the paper,
š¯‘‰_a
is defined as a set of attributes, eg.:š¯‘‰_a
= {he, she, man, woman}. In the loss in the Eq. (1), all attribute words are iterated, and for each attribute, one obtains its static embedding.In the code, however,
attribute_vector_example()
, which I believe implements Eq. (2), averages every male- and female-related attributes:https://github.com/kanekomasahiro/context-debias/blob/2161b38efcb03dd894690f54805686ed8b75939a/src/run_debias_mlm.py#L404-L405
As a result, we obtain only two embeddings
attributes_hiddens[attribute{0, 1}]
of shape[1, 7, 768]
(in case of DistiBERT), whereas Eq(1) and (2) suggests us to have as many embeddings as there are sentences containing each attribute fromš¯‘‰_a
.Is there any other explanation for that other than "faster computation"?
2. Source of the static embeddings of the attributes
i
in theL_i
orš¯‘£_i
in the equations (1) and (2) stands forfirst
,last
orall
layers, as explained in the last paragraph of section 3.Digging up the code for Eq.(2), the
attribute_vector_example()
, we find it callsget_hiddens_of_model()
, which return all hidden layers. All these layers are further processā€”the static embeddings are always extracted for all layers, no matter what the debiasing mode is.I would really appreciate your time. Sincerely, P.J