tagoyal / factuality-datasets

45 stars 4 forks source link

why not encode arc relationships in the model? #6

Open Sere-Fu opened 2 years ago

Sere-Fu commented 2 years ago

well, in the previous work of dae-factuality repo, you encode an arc in the form of tok.encode("a rel b") (though I don't think encode the relationship in its plain text form a good idea). In this factuality-dataset repo, I notice that both in the code and the paper, you just encode the head and child token and send the (head, child) into the classfier, ignoring the specific relationship. I guess this is not the ideal solution but somehow it performs better? Correct me if i am wrong. BTW, really good work.

tagoyal commented 2 years ago

Hi, glad you enjoyed our work! :)

In our experiments with the CNN/DM and XSum datasets, we observed that explicitly including the arc relation did not improve the performance of the model. My guess is that these models can implicitly determine the arc relation from the (head, child) representation and does not need it to be explicitly encoded. Therefore, for simplicity, we removed the arc representation from the model architecture.

Sere-Fu commented 2 years ago

Hi, glad you enjoyed our work! :)

In our experiments with the CNN/DM and XSum datasets, we observed that explicitly including the arc relation did not improve the performance of the model. My guess is that these models can implicitly determine the arc relation from the (head, child) representation and does not need it to be explicitly encoded. Therefore, for simplicity, we removed the arc representation from the model architecture.

In terms of pure logic, without encoding the specific rel, dependency arc entailment is ill-defined. (head rel_a child) might be right but not the case of (head rel_b child).

Have tried using a separate relationship encoder: tokens_em = electra.forward(<head, child>); rel_em = rel_embedding.forward(); (rel_id is one-hot encoding of possible relationships)

Not to much difference in terms of acc_result. Not surprised though. My opinion is that the model (or “nearly" any nlp model) still relies on surface info. Models do not operate by "understanding" any fancy high-level (doc level) semantic thing. That's why encoding rel or not makes no difference (again, logically it should). In terms of this issue: We barely know how to walk, talking about kinematic theory is like a joke.

BTW, the test set is so small and biased, obviously not able to reflect the model's perf fully and objectively. Fun fact: Simply add a weighted loss, in XSUM-human (2000 training, and 500 test), I notice a perf gain from 78.7 balanced_acc to 81.0.