iliaschalkidis / lmtc-eurlex57k

Large-Scale Multi-Label Text Classification on EU Legislation
Apache License 2.0
92 stars 10 forks source link

About the labels and performance #1

Closed cbbjames closed 4 years ago

cbbjames commented 5 years ago

Hi,

Thanks for the nice dataset and paper. As emphasizing on large number of labels, do you by chance have a histogram of frequency of labels and per-label performance for a peek?

Thanks,

iliaschalkidis commented 4 years ago

Thanks @cbbjames . You may find a histogram in the article https://www.aclweb.org/anthology/W19-2209/. Computing performance per label would be an overkill, as we deal with thousands of labels and it is impossible to track, compare and comment on scores for all these labels. Anyway, you may do that by amending the code (lmtc.py/calculate_perfomance) to get per-label precision, recall, f1-score.