Why not compare with SSAN-KD-Rb-l models that use knowledge distillation strategy?

tonytan48 / KD-DocRE

Implementation of Document-level Relation Extraction with Knowledge Distillation and Adaptive Focal Loss

110 stars 20 forks source link

Why not compare with SSAN-KD-Rb-l models that use knowledge distillation strategy? #2

Closed WatsonWangZh closed 2 years ago

tonytan48 commented 2 years ago

Hi Watson, Thank you for your question. Actually, when I try to reproduce SSAN-KD, I found that it is hard to tune the threshold class. This is mainly because SSAN uses a global threshold and needed many tuning work. And the SSAN-Adapt part was not released so we did not run SSAN-NA and SSAN-KD by ourselves.