own-pt / mill

crunch textual wordnet data
Apache License 2.0
2 stars 1 forks source link

support multiple languages #32

Closed odanoburu closed 4 years ago

odanoburu commented 5 years ago

~branch multilang has a simple implementation were we don't actually change anything about the representation of the synsets or their relations, we simply allow lexicographer files in different directories (one for each language). this approach is nice because it doesn't create any special cases for English (or any other language), but it generates the following doubts:~

~- figure out export commands:~ ~- [ ] for WNDB, should we strip out interlingual relations? (yes, but how to do it cleanly?)~ ~- [ ] for JSON, should we lump everything in one output, or should we generate one file per language?~

branch multilingual now implements an approach where each WN has a name and this name is part of {synset,wordsense} identifiers; when exporting we may choose to restrict output to one language or to output everything.