vid-koci / bert-commonsense

Code for papers "A Surprisingly Robust Trick for Winograd Schema Challenge" and "WikiCREM: A Large Unsupervised Corpus for Coreference Resolution"
70 stars 13 forks source link

Possible bug #3

Closed jc-ryan closed 3 years ago

jc-ryan commented 3 years ago

Hello, there seems to be a bug in your training procedure,

` loss_1 = model.forward(input_ids_1, token_type_ids=segment_ids_1, attention_mask=input_mask_1, masked_lm_labels=label_ids_1) loss_2 = model.forward(input_ids_2, token_type_ids=segment_ids_2, attention_mask=input_mask_2, masked_lm_labels=label_ids_2)

[B]

            loss = loss_1 + args.alpha_param * torch.max(torch.zeros(loss_1.size(), device=device),
                                                         torch.ones(loss_1.size(),
                                                                    device=device) * args.beta_param + loss_1 - loss_2.mean())`

The above code always takes candidate_1 as the correct one, while in your training data, candidate_1 and candidate_2 are both possible to be chosen. I noticed that you had processed the test data to make sure that candidate_1 was always the answer, but not training data(however, bias will be introduced by doing so). So I wonder if it's actually a bug or I missed any part?

thanks, looking forward to your reply.

vid-koci commented 3 years ago

Hello, thank you for your interest. Train data, too, is processed in a way so that the first candidate is always the correct one. Note that the model is agnostic to the fact whether a candidate is the first or the second candidate, it simply assigns the current candidate a score without access to the other candidate (or the knowledge whether it's dealing with the first or the second candidate). The order in which the candidates are processed thus does not make any difference.

Hope that answers your question!

BrandonBennett99 commented 3 years ago

I and my collaborator Suk Joon Hong did a lot of work with the code and didn't find any bugs. Some parts are a bit hard to understand, which is fine for a research prototype, but it could be a bit more modular and would be nice to have a more straightforward set of interface functions. Suk and I actually each wrote extra functions to go on top of it. Possibly in the future we could collaborate and produce an enhanced version. At the moment Suk and I are very busy with other things but we will be working more on Winograd later in the year. Best wishes, Brandon