Closed goodmami closed 2 years ago
With synsets linked by ILIs instead of PWN synset IDs, we don't really need to depend on PWN 3.0. For the next release, should they depend on PWN 3.1, EWN 2020, or EWN 2021 (coming in April?) instead?
Somewhat related to this: why don't we split the examples from the definitions in the PWN's WN-LMF files? Was there a reliable automatic method that the EWN used, or was manual effort required?
Related:
Also here's the code used by gwn-scala-api for extracting examples from the WNDB gloss:
Here is a checklist of things that we should do. @fcbond would you prefer that I create individual local issues for those that don't already have one?
s
on Lemmasmembers
attribute on <Synset>
(c.f. globalwordnet/english-wordnet#660)<Sense>
ID (c.f. globalwordnet/english-wordnet#662)<SyntacticBehaviour>
elements from lexical entries to the lexicon level (c.f. globalwordnet/english-wordnet#661)dc:subject
to lexfile
attribute<Requires>
I think we have done all of these, although sense key is not used as the sense ID, but rather stored as dc:identifier
When the WN-LMF 1.1 format is merged (see globalwordnet/schemas#38) we should release the OMW data in that format. The main thing is adding
<Requires id="pwn" version="3.0" />
elements on the "expand" wordnets. Also the<SyntacticBehaviour>
elements in the PWN files can be relocated. Anything else? Are we replacing<Sense>
ids with the value of theirdc:identifier
attribute?