kuhumcst / DanNet

The Danish WordNet as an RDF graph.
https://wordnet.dk
MIT License
20 stars 0 forks source link

Attach Supersenses to Synsets #138

Closed simongray closed 4 months ago

simongray commented 4 months ago

Supersenses, as seen in the English WordNet, have already been mapped 1:1 to DanNet's ontological types derived from the EuroWordNet ontology.

I have an excel file supplied by Bolette to use for populating DanNet with Supsersenses based on this mapping.

Supersenses

Princeton documentation: https://wordnet.princeton.edu/documentation/lexnames5wn

From email correspondence:

Bolette: Supersenses were popular in a certain period of wsd investigations because they made disambiguation more manageable in NLP. They are sometimes seen as an extension of NER. One could also use an ontology like the EuroWordNet Ontology, but for some reason supersenses became more used for the wsd purposes in a series of papers. I have not seen a lot of work supersenses in later years, though.

(...)

We refer among others to these two papers:

Massimiliano Ciaramita and Yasemin Altun. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proc. of Proceedings of EMNLP, pages 594–602, Sydney, Australia, July.

Massimiliano Ciaramita and Mark Johnson. 2003. Supersense tagging of unknown nouns in WordNet. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 168– 175. Association for Computational Linguistics.

We worked with them in this paper: https://aclanthology.org/2016.gwc-1.30.pdf

Another email (usage of Supersenses):

Og link til korpusset, herunder den danske del: https://www.clarin.si/repository/xmlui/handle/11356/1842

Som er den del vi i første omgang gerne vil linke til supersenses

simongray commented 4 months ago

The Supersenses mapping is a 1-to-many, but the many all seem to be separated by part-of-speech, fortunately.

The query will have to take this into account.

simongray commented 4 months ago

Apparently, the only problematic rows are these

Plant+Object+Comestible     136 noun.food; noun.plant
Plant+Object+Part+Comestible    324 noun.food; noun.plant

so it may just be down to selecting if edible plants are food or plants.

simongray commented 4 months ago

Currently blocked by row 137:

noun.food   804 noun.substance

The first column should be an ontotype, but it has been replaced with a Supersense, making the ~800 synsets impossible to classify until the original authors of this mapping (e.g. Bolette) chime in.

simongray commented 4 months ago

I went with Natural+Substance after conferring with Sussi.