Closed fredsonaguiar closed 3 years ago
OK. the adjective marks are documented in https://wordnet.princeton.edu/documentation/wndb5wn
In data.adj , a word is followed by a syntactic marker if one was specified in the lexicographer file. A syntactic marker is appended, in parentheses, onto word without any intervening spaces. See wninput(5WN)(link is external) for a list of the syntactic markers for adjectives.
From https://github.com/own-pt/wordnet2rdf/blob/master/wordnet-db-parser.lisp#L22, I am assuming we have ignored that information in the generation of the OWN-EN RDF from the PWN 3.0. I believe this was an error, we need to fix that.
If I understood it right, this information should be attached to the sense, right? Not to a word. See that if we search for salient
, we have in data.adj instances without the mark.
data.adj
3265:00580805 00 s 05 outstanding 0 prominent 0 salient 0 spectacular 0 striking 0 005 & 00579084 a 0000 + 14434022 n 0503 + 06889138 n 0401 + 14434022 n 0302 + 14434022 n 0301 | having a quality that thrusts itself into attention; "an outstanding fact of our time is that nations poisoned by anti semitism proved less fortunate in regard to their own freedom"; "a new theory is the most prominent feature of the book"; "salient traits"; "a spectacular rise in prices"; "a striking thing about Picadilly Circus is the statue of Eros in the center"; "a striking resemblance between parent and child"
6788:01235439 00 s 01 salient(ip) 0 002 & 01234167 a 0000 ;c 05801594 n 0000 | represented as leaping (rampant but leaning forward)
14417:02591896 00 a 01 salient 0 001 ! 02592015 a 0101 | (of angles) pointing outward at an angle of less than 180 degrees
14696:02631238 01 a 03 anuran 0 batrachian 0 salientian 0 007 ;c 06083243 n 0000 + 01639369 n 0301 \ 01639369 n 0301 + 01639765 n 0205 \ 01639369 n 0205 + 01639765 n 0104 \ 01639369 n 0103 | relating to frogs and toads
This is the adjposition property of a sense in https://github.com/globalwordnet/schemas/blob/master/WN-LMF-1.1.dtd#L94. We can use the same name for our RDF model.
Commit 9f704eb added this property to the RDF Schema.
In eee482f4d5311642b73d288be9bd873dddcd9c9b, we added those informations, running this script. It is responsible for finding the marked adjective words, the corresponding senses related, and adds the property wn30:adjPosition
. Notice for that we use only the own-en-wordsenses.ttl
and own-en-synsets.ttl
files from OWN-EN. One might run it as:
python3 adjective_markers.py own-files/own-en-wordsenses.ttl own-files/own-en-synsets.ttl WordNet-3.0/dict/data.adj -o own-en-wordsenses.ttl -v
Here, data.adj
is the database file from https://wordnet.princeton.edu/documentation/wndb5wn.
While checking the output of https://github.com/own-pt/py-ownpt/tree/fa84fe0eb2c31a9c5aafb1772278ffab7d6c0f6d generating the LMF for own-en, 260 missing/differing
LexicalEntrys
were found, comparing with those from https://github.com/bond-lab/omw-data/blob/9f2df85bbbab39370e265a2e2d90d95b6d015f04/wns/pwn30/wn30.xml.xz.The differences happen for some
LexicalEntrys
withwrittenForms
ending with(a)
,(p)
and(ip)
, such as in "owing(p)", "complete(a)" and "gardant(ip).Words
containing thoselexicalForms
are not exactly found in the https://github.com/own-pt/openWordnet-PT/blob/df754c2e4ee72127553147f16d0d2fedd6b0a9fb/wordnet-en.nt.gz, instead, you can find them without the parenthesis, such as "'ablaze(p)", wich in found as "ablaze".