Closed fredsonaguiar closed 3 years ago
We can proliferate the number of types in the vocabulary OR we can just add and extra property in Word indicating their part-of-speech... In the DTD, they have a single tag for a lexical form
https://github.com/globalwordnet/schemas/blob/master/WN-LMF-1.1.dtd#L57
The Lemon also uses only a single Type
https://globalwordnet.github.io/schemas/#rdf
The original model we adopted also used only a single type for words with an extra specialization only to collocations:
I prefer only a single type Word
for now.
I agree with a single type for now, but we still need to expand some of the current Words for each POS it has, otherwise, we wouldn't be able to deal with those POS specific informations.
The problem is to remap predications. Words are objects of wn30:word
, nomlex:noun
and nomlex:verb
. We need to pay attention to it.
Perfect, slipt a Word like dog
into two: the verb and the noun. I agree. But both, for now, will be of type Word
.
For now, we describe the new property wn30:pos
in 7fb7f56dd4869f0297af08c35d248669b49d2458, and expand Word
s, as discussed, considering related Sense
s and Nomlex
es, in 156d2e10e178c62b4f1cc2573e3bcbf23bdb561f. We do so through this script.
Running and responses:
python3 pyownpt/cli/words_unique_pos.py own-files/own-en-words.ttl -l en -o own-en-words.ttl -v
INFO:root:loading data from file 'own-files/own-en-words.ttl'
INFO:ownpt:start formatting Words to unique POS
INFO:ownpt:action applied to 148730 words
total: 676730 triples added
total: 504438 triples removed
INFO:root:serializing output to 'own-en-words.ttl'
python3 pyownpt/cli/words_unique_pos.py own-files/own-pt-words.ttl own-files/own-pt-morphosemantic-links.ttl -l pt -o own-pt-words-morpho.nt -v
INFO:root:loading data from file 'own-files/own-pt-words.ttl'
INFO:root:loading data from file 'own-files/own-pt-morphosemantic-links.ttl'
INFO:ownpt:start formatting Words to unique POS
INFO:ownpt:action applied to 57972 words
total: 270132 triples added
total: 208188 triples removed
INFO:root:serializing output to 'own-pt-words-morpho.nt'
We got the new definitions considering wn30:pos
. The parts-of-speech to expand were exatracted based on the related senses to that word. For instance, for the word dog:
-<https://w3id.org/own-pt/wn30-en/instances/word-dog> a wn30:Word ;
- wn30:lexicalForm "dog"@en .
[...]
+<https://w3id.org/own-pt/wn30-en/instances/word-dog-v> a wn30:Word ;
+ wn30:lexicalForm "dog"@en ;
+ wn30:pos "v" .
[...]
+<https://w3id.org/own-pt/wn30-en/instances/word-dog-n> a wn30:Word ;
+ wn30:lexicalForm "dog"@en ;
+ wn30:pos "n" .
From https://wordnet.princeton.edu/documentation/wndb5wn, alongside with #175, we learned that the same
Lemma
might have different behaviors and associated informations depending on itsPOS
. For instance, the exceptional forms for lemma "taxi" occur differently in different files:noun.exc : taxies taxi
andverb.exc : taxying taxi
.The solution in this case should be to define words with different types, such as
VerbWord
,NoumWord
,AdverbWord
andAdjectiveWords
, or at lest define unique URIs for words with same Lemma but different POS.