omwn / omw-data

This packages up data for the Open Multilingual Wordnet
43 stars 3 forks source link

Create a new release with some improvements (1.5) #31

Open fcbond opened 1 year ago

fcbond commented 1 year ago
fcbond commented 1 year ago

We could go two ways with synsets like moke "British informal for donkey"

  1. link it with ir_synonym and make sure both sides have the same translations
  2. merge, and mark the senses with the dialect and register tags
    • so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal
ekaf commented 1 year ago

take from merges in oewn

@fcbond, this sounds ambiguous, and may not be optimal: merges are relative to a target English Wordnet version, so you would for ex. pick either OEWN 2021 or 2022, and then deal with different merges in later OEWN versions? It might be better not to handle the merges in OMW-data: NLTK now handles OMW merges seamlessly with any OEWN version, and @goodmami might eventually consider a similar approach in Wn for solving the related issue https://github.com/goodmami/wn/issues/179

arademaker commented 1 year ago

merge, and mark the senses with the dialect and register tags so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal

I prefer this option

goodmami commented 1 year ago

Also consider fixing #32 for this release.

@goodmami might eventually consider a similar approach in Wn for solving the related issue https://github.com/goodmami/wn/issues/179

The issue is no longer fresh in my mind, but I don't think I was planning on making any significant changes to Wn. More likely I would suggest some documentation about how to deal with such merges, such as using the code snippet I wrote in that issue. But I should first check out how it was handled in the NLTK.

goodmami commented 1 month ago

If a 1.5 version is still on the agenda, let's consider adding pre-3.0 versions of the Princeton WordNet data (see https://github.com/goodmami/wn/issues/199).

fcbond commented 2 weeks ago

I am thinking I will probably not try to do too much here: identifying variants should really be done in the language project (so in OEWN for English).

These are the minimum I would like to see for this:

Most of these are close to done, I need to push out for review, ...