own-pt / openWordnet-PT

OpenWordnet-PT: an open access wordnet for Portuguese
http://openwordnet-pt.org
Other
154 stars 35 forks source link

Words should have more specific Types #177

Closed fredsonaguiar closed 3 years ago

fredsonaguiar commented 3 years ago

From https://wordnet.princeton.edu/documentation/wndb5wn, alongside with #175, we learned that the same Lemma might have different behaviors and associated informations depending on its POS. For instance, the exceptional forms for lemma "taxi" occur differently in different files: noun.exc : taxies taxi and verb.exc : taxying taxi.

The solution in this case should be to define words with different types, such as VerbWord, NoumWord, AdverbWord and AdjectiveWords, or at lest define unique URIs for words with same Lemma but different POS.

arademaker commented 3 years ago

We can proliferate the number of types in the vocabulary OR we can just add and extra property in Word indicating their part-of-speech... In the DTD, they have a single tag for a lexical form

https://github.com/globalwordnet/schemas/blob/master/WN-LMF-1.1.dtd#L57

The Lemon also uses only a single Type

https://globalwordnet.github.io/schemas/#rdf

The original model we adopted also used only a single type for words with an extra specialization only to collocations:

https://www.w3.org/TR/wordnet-rdf/

arademaker commented 3 years ago

I prefer only a single type Word for now.

fredsonaguiar commented 3 years ago

I agree with a single type for now, but we still need to expand some of the current Words for each POS it has, otherwise, we wouldn't be able to deal with those POS specific informations.

The problem is to remap predications. Words are objects of wn30:word, nomlex:noun and nomlex:verb. We need to pay attention to it.

arademaker commented 3 years ago

Perfect, slipt a Word like dog into two: the verb and the noun. I agree. But both, for now, will be of type Word.

fredsonaguiar commented 3 years ago

For now, we describe the new property wn30:pos in 7fb7f56dd4869f0297af08c35d248669b49d2458, and expand Words, as discussed, considering related Senses and Nomlexes, in 156d2e10e178c62b4f1cc2573e3bcbf23bdb561f. We do so through this script.

Running and responses:

python3 pyownpt/cli/words_unique_pos.py own-files/own-en-words.ttl -l en -o own-en-words.ttl -v
INFO:root:loading data from file 'own-files/own-en-words.ttl'
INFO:ownpt:start formatting Words to unique POS
INFO:ownpt:action applied to 148730 words
    total: 676730 triples added
    total: 504438 triples removed
INFO:root:serializing output to 'own-en-words.ttl'
python3 pyownpt/cli/words_unique_pos.py own-files/own-pt-words.ttl own-files/own-pt-morphosemantic-links.ttl -l pt -o own-pt-words-morpho.nt -v
INFO:root:loading data from file 'own-files/own-pt-words.ttl'
INFO:root:loading data from file 'own-files/own-pt-morphosemantic-links.ttl'
INFO:ownpt:start formatting Words to unique POS
INFO:ownpt:action applied to 57972 words
    total: 270132 triples added
    total: 208188 triples removed
INFO:root:serializing output to 'own-pt-words-morpho.nt'
fredsonaguiar commented 3 years ago

We got the new definitions considering wn30:pos. The parts-of-speech to expand were exatracted based on the related senses to that word. For instance, for the word dog:

-<https://w3id.org/own-pt/wn30-en/instances/word-dog> a wn30:Word ;
-    wn30:lexicalForm "dog"@en .
[...]
+<https://w3id.org/own-pt/wn30-en/instances/word-dog-v> a wn30:Word ;
+    wn30:lexicalForm "dog"@en ;
+    wn30:pos "v" .
[...]
+<https://w3id.org/own-pt/wn30-en/instances/word-dog-n> a wn30:Word ;
+    wn30:lexicalForm "dog"@en ;
+    wn30:pos "n" .