ltgoslo / definition_modeling

Interpretable Word Sense Representations via Definition Generation
GNU General Public License v3.0
6 stars 1 forks source link

Reproduce Code #2

Closed MengXu0826 closed 11 months ago

MengXu0826 commented 11 months ago

Hi there,

Followed your Usage, I've generated predicted.npz and predicted.tsv.gz. The dataset I used is Wordnet. But I don't know what to do next. May you provide the completed experimental procedure?

I try to run cluster_definitions.py or sense_label.py, but both two files need complete.tsv.gz containing usages and cluster ids, could you please tell me how to get this?

Thanks!

akutuzov commented 11 months ago

Hi! What exactly is your aim? To produce cluster labels / sense definitions, you first of all need usage clusters. WordNet does not contain any usage clusters. If you simply want to generate definitions for usages in WordNet, the generate_t5.py script does that.

MengXu0826 commented 11 months ago

Hi! What exactly is your aim? To produce cluster labels / sense definitions, you first of all need usage clusters. WordNet does not contain any usage clusters. If you simply want to generate definitions for usages in WordNet, the generate_t5.py script does that.

Cluster labels are exactly what I need. So I need to annotate usage clusters like DWUG in order to continue experiments on other datasets?

akutuzov commented 11 months ago

If you need cluster labels, you need clusters themselves first, yes (whatever is the way you produce these clusters). As an example, you can just assign all your usages one and the same cluster label and then run sense_labels.py on this data. It will generate one label. In our experiments with DWUGs, clusters were already existing as part of these DWUGs.

MengXu0826 commented 11 months ago

Thanks!!!