omwn / omw-data

This packages up data for the Open Multilingual Wordnet
40 stars 3 forks source link

Release in WN-LMF 1.1 #3

Closed goodmami closed 2 years ago

goodmami commented 3 years ago

When the WN-LMF 1.1 format is merged (see globalwordnet/schemas#38) we should release the OMW data in that format. The main thing is adding <Requires id="pwn" version="3.0" /> elements on the "expand" wordnets. Also the <SyntacticBehaviour> elements in the PWN files can be relocated. Anything else? Are we replacing <Sense> ids with the value of their dc:identifier attribute?

goodmami commented 3 years ago

With synsets linked by ILIs instead of PWN synset IDs, we don't really need to depend on PWN 3.0. For the next release, should they depend on PWN 3.1, EWN 2020, or EWN 2021 (coming in April?) instead?

goodmami commented 3 years ago

Somewhat related to this: why don't we split the examples from the definitions in the PWN's WN-LMF files? Was there a reliable automatic method that the EWN used, or was manual effort required?

Related:

Also here's the code used by gwn-scala-api for extracting examples from the WNDB gloss:

https://github.com/jmccrae/gwn-scala-api/blob/742cd7cddee021b27c0e5681a4ea20b92d887e4c/src/main/scala/org/globalwordnet/wnapi/wndb.scala#L208-L222

goodmami commented 2 years ago

Here is a checklist of things that we should do. @fcbond would you prefer that I create individual local issues for those that don't already have one?

fcbond commented 2 years ago

I think we have done all of these, although sense key is not used as the sense ID, but rather stored as dc:identifier