own-pt / openWordnet-PT

OpenWordnet-PT: an open access wordnet for Portuguese
http://openwordnet-pt.org
Other
153 stars 35 forks source link

topology of PWN #142

Open vcvpaiva opened 6 years ago

vcvpaiva commented 6 years ago

Could anyone tell me how many of the 117659 synsets have glosses? not all do

Can we add to the repo somewhere the corpus of glosses, inspectable? https://wordnet.princeton.edu/glosstag.shtml

fcbr commented 6 years ago

@vcvpaiva AFAIK all PWN synsets have glosses. For example, if we use the Prolog output of PWN, and removing the duplicates in the Prolog generated we have 117659 entries.

$ cd prolog
$ cat wn_g.pl | awk -F, '{print $1}'| sort | uniq -c | wc
117659

Also, if we look at the tagged glosses, it seems that all of them have tagged glosses too:

$ cd glosstag/standoff
$ cat index.byid.tab | awk -F'$\t' '{print $1'} | sort | uniq | wc
117659
vcvpaiva commented 6 years ago

thanks @fcbr! this is odd, as I am sure many times I have had the impression not having a gloss. maybe it's when it's a single word like

05893261-n sine_qua_non, essential_condition | sine qua non (a prerequisite)

what is a tagged gloss, please?

and questions on the topology of PWN:

  1. how many synsets s go directly to Entity? do all synsets go to Entity?

  2. how many have two hops?

  3. how many have a long hierarchy like kitty<domestic_cat <cat<feline<carnivore< placental_mammal < mammal<vertebrate<chordate<animal<organism<living_thing<entity?

  4. I seem to remember that you were calculating isolated nodes vs hierarchies? where is that data now?

arademaker commented 6 years ago

Yes, we do have glosses with only 1-2 words. The tagged gloss corpus is not complete; not all glosses are entirely tagged, I talked with Christiane Fellbaum about it. Actually, this is an excellent work still waiting to be done.

corpus of glosses = tagged corpus