own-pt / glosstag

Semantically Tagged PWN glosses
Other
7 stars 4 forks source link

some glosses are repeated #33

Open arademaker opened 1 year ago

arademaker commented 1 year ago

We have glosses repeated in PWN 3.0 and PWN 3.1.

One example:

  1. http://wn.mybluemix.net/synset?id=01156302-a (mellow)
  2. http://wn.mybluemix.net/synset?id=01492061-a (mellowed • mellow)
ar@tenis glosstag-kg % cat ../WordNet-3.0/dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | head
  23  a variety of aster
  18  a branch of the Tai languages
  13  one species
  13  one of the British colonies that formed the United States
  11  a genus of bacteria
   9  an artificial language
   9  a radioactive transuranic element
   9  a genus of Mustelidae
   9  a Chadic language spoken south of Lake Chad
   8  a genus of Psittacidae

% cat ../WordNet-3.0/dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | awk '$1 > 1 {print $1}' | sort | uniq -c
   1 11
   2 13
   1 18
 289 2
   1 23
  40 3
  18 4
   9 5
   5 6
   5 7
   1 8
   4 9

For PWN 3.1

% cat ../WordNet-3.1-dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | head
  23  a variety of aster
  18  a branch of the Tai languages
  13  one species
  13  one of the British colonies that formed the United States
  11  a genus of bacteria
   9  an artificial language
   9  a radioactive transuranic element
   9  a genus of Mustelidae
   9  a Chadic language spoken south of Lake Chad
   8  a genus of Psittacidae

% cat ../WordNet-3.1-dict/data.* | awk -F "|" '$0 ~ /^[0-9]/ {print $2}' | sort | uniq -c | sort -nr | awk '$1 > 1 {print $1}' | sort | uniq -c
   1 11
   2 13
   1 18
 275 2
   1 23
  40 3
  18 4
   9 5
   6 6
   5 7
   1 8
   4 9
arademaker commented 1 year ago

we have two problems with the repetitions:

  1. extra effort on annotation
  2. possible inconsistencies in the analyses of the same sentence
CL-USER> (main-1)
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . VBN) (senses attain%2:38:01::) (tag . man))
 ((pos . VBN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . NN) (senses gentleness%1:07:00::) (tag . man))
 ((pos . NN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((pos . NN) (senses age%1:28:01::) (tag . man))
 ((pos . NN) (tag . un))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses age%1:28:00::) (sep . ) (tag . man))
 ((sep . ) (tag . un))
D shaped like a sausage
 ((pos . VBN) (senses shape%2:36:00::) (tag . man))
 ((pos . VBN) (senses shape%2:30:00::) (tag . man))
D greatly desired
 ((pos . JJ) (senses desire%2:37:00:: desired%5:00:00:wanted:00) (sep . )
  (tag . man))
 ((pos . JJ) (senses desired%5:00:00:wanted:00) (sep . ) (tag . man))
D copperheads
 ((pos . NN) (senses copperhead%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses copperhead%1:05:02::) (sep . ) (tag . man))
D pearl oysters
 ((senses pearl_oyster%1:05:00::) (tag . man))
 ((senses pearl_oyster%1:05:00::) (tag . auto))
D fur seals
 ((senses fur_seal%1:05:02::) (tag . man))
 ((senses fur_seal%1:05:01::) (tag . man))
D fungus gnats
 ((senses fungus_gnat%1:05:02::) (tag . man))
 ((senses fungus_gnat%1:05:01::) (tag . man))
D moths whose larvae are armyworms
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01:: armyworm%1:05:03::)
  (sep . ) (tag . man))
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01::) (sep . )
  (tag . man))
D mole rats
 ((senses mole_rat%1:05:01::) (tag . man))
 ((senses mole_rat%1:05:03::) (tag . man))
D ribbonfishes
 ((pos . NN) (senses ribbonfish%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses ribbonfish%1:05:02::) (sep . ) (tag . man))
D snappers
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses snapper%1:05:01::) (sep . ) (tag . man))
D a white crystalline compound used as an analgesic and also as an antipyretic
 ((pos . JJ) (senses analgesic%1:06:00::) (tag . man))
 ((pos . JJ) (senses analgesic%5:00:00:moderating:00) (tag . man))
D the quality of being inaccurate and having errors
 ((pos . NNS) (senses error%1:07:00::) (sep . ) (tag . man))
 ((pos . NNS) (senses error%1:04:02:: error%1:10:00::) (sep . ) (tag . man))
D a dark purplish-red color
 ((pos . JJ) (tag . un))
 ((pos . JJ) (senses dark%3:00:02::) (tag . man))
D (physics) one of the six flavors of quark
 ((senses physics%1:09:00::) (sep . ) (tag . man))
 ((sep . ) (tag . un))
D a conventional expression of greeting or farewell
 ((pos . NN) (senses expression%1:10:00:: expression%1:10:04::) (tag . man))
 ((pos . NN) (senses expression%1:10:04:: expression%1:10:00::) (tag . man))
D a Chadic language spoken south of Lake Chad
 ((pos . VBN) (senses spoken%3:00:00::) (tag . man))
 ((pos . VBN) (senses speak%2:32:02::) (tag . man))
D milk from which some of the cream has been removed
 ((pos . NN) (senses milk%1:13:01::) (tag . man))
 ((pos . NN) (tag . un))
D milk from which some of the cream has been removed
 ((pos . NN) (senses cream%1:13:00::) (tag . man))
 ((pos . NN) (tag . un))
D a state in northern Mexico; mostly high plateau
 ((pos . NN) (tag . un))
 ((pos . NN) (senses state%1:15:01::) (tag . man))
D a Mid-Atlantic state; one of the original 13 colonies
 ((pos . NN) (senses state%1:15:01::) (sep . ) (tag . man))
 ((pos . NN) (sep . ) (tag . un))
D a Mid-Atlantic state; one of the original 13 colonies
 ((pos . NN) (senses state%1:15:01::) (sep . ) (tag . man))
 ((pos . NN) (sep . ) (tag . un))
D an African river; flows into the Indian Ocean
 ((pos . VBZ) (senses flow%2:38:00::) (tag . man))
 ((pos . VBZ) (tag . un))
D a member of a group of Siouan people who constituted a division of the Teton Sioux
 ((pos . VB) (senses constitute%2:42:00:: constitute%2:42:03::) (tag . man))
 ((pos . VB) (senses constitute%2:42:03:: constitute%2:42:00::) (tag . man))
D a member of a group of Siouan people who constituted a division of the Teton Sioux
 ((pos . VB) (senses constitute%2:42:00:: constitute%2:42:03::) (tag . man))
 ((pos . VB) (senses constitute%2:42:03:: constitute%2:42:00::) (tag . man))
D oxeye
 ((pos . NN) (senses oxeye%1:20:02::) (sep . ) (tag . man))
 ((pos . NN) (senses oxeye%1:20:01::) (sep . ) (tag . man))
D a grain of barley
 ((pos . NN) (senses grain%1:20:00::) (tag . man))
 ((pos . NN) (senses grain%1:13:00::) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D one species
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses species%1:14:00::) (sep . ) (tag . man))
D having more than one husband at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D having more than one husband at a time
 ((pos . NN) (senses time%1:28:05:: time%1:11:00::) (sep . ) (tag . man))
 ((pos . NN) (senses time%1:11:00:: time%1:28:05::) (sep . ) (tag . man))
D having more than one wife at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D hardened clay
 ((pos . NN) (senses clay%1:27:00::) (sep . ) (tag . man))
 ((pos . NN) (senses clay%1:27:02::) (sep . ) (tag . man))
arademaker commented 1 year ago

We are considering possible approaches to remove duplicated sentences. Before merging equal strings, the question is if we we have any situation where the same string (definition or example) could be sense tagged in a different way depending on the synset is using it? That is, the context of its usage.

both with the gloss a grain of barley

  1. http://wn.mybluemix.net/synset?id=12123648-n (noun plant)
  2. http://wn.mybluemix.net/synset?id=07803093-n (noun food)

how to annotate barley?

  1. http://wn.mybluemix.net/synset?id=12123244-n (noun plant)
  2. http://wn.mybluemix.net/synset?id=07803093-n (noun food = same above)
arademaker commented 1 year ago

To answer the question above, I manually removed all spurious inconsistencies in the sense tagging of the duplicated sentences. The list now is reduced to:

  1. 01156302-a mellow | doce (having attained to kindliness or gentleness through age and experience; "mellow wisdom"; "the peace of mellow age")
  2. 01492061-a mellowed, mellow (having attained to kindliness or gentleness through age and experience; "mellow wisdom"; "the peace of mellow age")

The two senses of mellow have the same definition and examples, each example annotated with the respective sense:

D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))
D having attained to kindliness or gentleness through age and experience; “mellow wisdom”; “the peace of mellow age”
 ((senses mellow%5:00:00:mature:02) (tag . auto))
 ((senses mellow%5:00:00:soft:02) (tag . auto))

Some complex problems that are showing some possible problematic relations or duplicated synsets:

D copperheads
 ((pos . NN) (senses copperhead%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses copperhead%1:05:02::) (sep . ) (tag . man))
D pearl oysters
 ((senses pearl_oyster%1:05:00::) (tag . man))
 ((senses pearl_oyster%1:05:00::) (tag . auto))
D fur seals
 ((senses fur_seal%1:05:02::) (tag . man))
 ((senses fur_seal%1:05:01::) (tag . man))
D fungus gnats
 ((senses fungus_gnat%1:05:02::) (tag . man))
 ((senses fungus_gnat%1:05:01::) (tag . man))
D moths whose larvae are armyworms
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01:: armyworm%1:05:03::)
  (sep . ) (tag . man))
 ((pos . NN) (senses armyworm%1:05:02:: armyworm%1:05:01::) (sep . )
  (tag . man))
D mole rats
 ((senses mole_rat%1:05:01::) (tag . man))
 ((senses mole_rat%1:05:03::) (tag . man))
D ribbonfishes
 ((pos . NN) (senses ribbonfish%1:05:01::) (sep . ) (tag . man))
 ((pos . NN) (senses ribbonfish%1:05:02::) (sep . ) (tag . man))
D snappers
 ((pos . NN) (sep . ) (tag . un))
 ((pos . NN) (senses snapper%1:05:01::) (sep . ) (tag . man))
D oxeye
 ((pos . NN) (senses oxeye%1:20:02::) (sep . ) (tag . man))
 ((pos . NN) (senses oxeye%1:20:01::) (sep . ) (tag . man))
D a grain of barley
 ((pos . NN) (senses grain%1:20:00::) (tag . man))
 ((pos . NN) (senses grain%1:13:00::) (tag . man))
D hardened clay
 ((pos . NN) (senses clay%1:27:00::) (sep . ) (tag . man))
 ((pos . NN) (senses clay%1:27:02::) (sep . ) (tag . man))

only POS tag

D having more than one husband at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
D having more than one wife at a time
 ((pos . RBR) (tag . ignore))
 ((pos . JJR) (tag . ignore))
fcbond commented 1 year ago

I am not sure I quite understand all of them, let's look tomorrow afternoon.