ltgoslo / norbench

Natural language understanding benchmarks for Norwegian
MIT License
13 stars 4 forks source link

Norwegian Wordnet Bokmål. Statistics #2

Open sigdelina opened 2 years ago

sigdelina commented 2 years ago

This issue contains information about statistics in the Norwegian Wordnet (Bokmål) from the National Norwegian Library.

At the initial stage, statistics was made on the official dataset from the National Library in general, showing distribution of number of examples, pos tags, and senses per lemma.

At this stage, more detailed statistics are carried out, including the following points:

  1. the statistics of distribution of unique sentences for the given lemma: choice was concentrated on lemmas that could provided 5 or more sentences through the dataset.

  2. it is proposed to divide words into categories depending on the number of possible senses.

sigdelina commented 2 years ago

In progress

sigdelina commented 2 years ago

Statistics

The statistics provided in the current link