own-pt / rte-sick

RTE Experiment
1 stars 3 forks source link

Analysis of the restricted corpus #57

Open vcvpaiva opened 7 years ago

vcvpaiva commented 7 years ago

According to stats (https://github.com/own-pt/rte-sick/blob/master/expanded/conllu/len.3-5.conllu.stats.xml) 390 sentences (385 nsubj+5nsubjpass), 1 conjunction only, 4 dependencies 'dep', 13 copulas, 140 direct objects, 26 expletives, 19 negations and 20 noun-nouns.

1 prep and 9 particles.

The rest is: "acomp"=2 "advmod"=34 "amod"= 20 "partmod"= 10 "pobj"= 1

also 30 ADJ, 45 ADV "quantmod"= 3

vcvpaiva commented 7 years ago

only conjunction is:

Paper and scissors both cut

scissors has lemmatization problem, no concept found. paper is wrong concept, newspaper. cut is wrong concept, process only.

want to distribute? enhanced CoreNLP does.

vcvpaiva commented 7 years ago

13 copulas: (one can check https://github.com/own-pt/rte-sick/blob/master/expanded/conllu/len.3-5.conllu)

  1. Some people are silent
  2. The child is silent
  3. A classroom is empty
  4. A man is motionless
  5. A man is scared
  6. The fish are immobile
  7. A man is silent
  8. A biker is naked
  9. The man is training ****
  10. Some kittens are hungry
  11. The man is rock climbing ****
  12. A woman is grating carrots ****
  13. The woman is dicing garlic ****

4 copulas are wrong as far as dependencies are concerned. others are wrong as far as the mapping is concerned, later.

vcvpaiva commented 7 years ago

19 negations marked:

  1. A fish is not swimming
  2. Two men are not fighting
  3. The men are not dancing
  4. A person is not frying
  5. People are not playing cricket
  6. The men are not talking
  7. The band isn't singing (noticed n't working fine!)
  8. The person is not drawing
  9. A man is not dancing
  10. The man is not dancing
  11. The man is not exercising
  12. The woman is not waterskiing
  13. The woman is not dancing
  14. Someone is not playing piano
  15. Some women are not talking
  16. A horse is not racing
  17. Two women are not dancing
  18. A jet is not flying
  19. The man is not drawing

all dependencies seem fine.

However there are several semantic negations that are not marked.

  1. No person is hiking
  2. There are no men fighting
  3. There is no man speaking
  4. There is no man drawing
  5. There is no dog barking
  6. There is no person writing
  7. There is no man spitting
  8. There is no band playing
  9. There is no parrot speaking
  10. There is no man exercising
  11. There is no baby talking
  12. There is no man mixing
  13. There is no man dancing
  14. There is no panda climbing
  15. There is no lion walking (amb?)
  16. There is no hamster singing
  17. There is no lemur eating (amb)
  18. There is no woman exercising
  19. There is no clown singing
  20. There is no one typing
  21. There is no woman dancing
  22. There is no puppy rolling
  23. There is no man praying
  24. There is no man screaming
  25. There are no mimes performing
  26. There are no men sawing
  27. There is no man thinking

and also:

  1. Nobody is riding a bike
  2. Nobody is playing ping pong
  3. Nobody is holding a hedgehog
  4. Nobody is feeding an animal
  5. Nobody is slicing a tomato
  6. Nobody is brushing a cat
  7. Nobody is beating an egg
  8. Nobody is playing the guitar
  9. Nobody is riding a bike
  10. Nobody is playing ping pong
vcvpaiva commented 7 years ago

Looking for the 20 nns is difficult, as the reps use NN for noun. However, the first 7 noun-nouns below seem correct. and to give them concepts, they need to be considered compounds.

  1. ping pong
  2. golden retriever
  3. sumo wrestlers
  4. tiger cub
  5. sumo ringers
  6. baby pandas 2x
  7. cartoon airplane

Wrong ones:

  1. men sawing (nn)
  2. man screaming (dep)
  3. puppy rolling
  4. woman dancing
  5. clown singing
  6. hamster singing
  7. lion walking
  8. panda climbing
  9. man dancing
  10. rock climbing
  11. baby talking
  12. band playing
  13. dog barking
  14. blonds girl (typo)
vcvpaiva commented 7 years ago

there are 4 dependencies "dep"

  1. There is no man speaking (speaking)
  2. Paper and scissors both cut (both)
  3. A kitten is getting bored (getting)
  4. There is no man screaming (screaming)
vcvpaiva commented 7 years ago

"acomp"=2 ===> falling asleep. "quantmod"===> 3 "a few" "amod"= 20 ===> correct ones: old woman, young female, young kids, cold cyclist, happy baby, young girl, white horse, sad man, hungry woman, small animal

wrong ones: a few kittens, few men, several children, lemur eating, one typing, grating carrots, dicing garlic, golden retriever, animated airplane, man thinking,

vcvpaiva commented 7 years ago

particle verbs:

  1. take off 3
  2. laying down
  3. sitting down
  4. waking up 2
  5. standing up 2 Note that this particles do mess up the concepts assignment, e.g.
# text = A toddler is standing up
1   A   a   DET DT  _   2   det _   DT|?|?
2   toddler toddler NOUN    NN  _   4   nsubj   _   NN|10714465-n|NonFullyFormed+
3   is  be  VERB    VBZ _   4   aux _   VBZ|02604760-v|Entity+
4   standing    stand   VERB    VBG _   0   ROOT    _   VBG|01546111-v|PhysicalAttribute+
5   up  up  PRT RP  _   4   prt _   RP|00097011-r|Increasing+

only one preposition: Someone is on a horse

vcvpaiva commented 7 years ago


  1. There are no men fighting (wrong root)
  2. There is no man speaking
  3. There is no man drawing
  4. There is no dog barking
  5. There is no person writing
  6. There is no man spitting
  7. There is no band playing
vcvpaiva commented 7 years ago

5 passive voices only, all correct.

  1. A boat is anchored
  2. A pencil is being sharpened
  3. An onion is being chopped
  4. An onion is being sliced
  5. Some paper is being cut
vcvpaiva commented 7 years ago

advmodifiers: (34 in total)

  1. quickly 2
  2. happily 2
  3. frantically
  4. gracefully 2
  5. slowly
  6. wildly
  7. noisily
  8. passionately
  9. cheerfully
  10. fervently
  11. loudly
  12. fearlessly


  1. together
  2. still
  3. outside
  4. inside
  5. indoors
  6. around
  7. alone
  8. downhill
  9. uphill
vcvpaiva commented 7 years ago

actually in the whole corpus we have 112 adverbs ending in "ly" -- this is for future use in the nomlex for adjective/adverbs. they're attached. sadverbs.txt

vcvpaiva commented 7 years ago

I do not understand why only the verb "sit" gives a different meaning to the auxiliary "to be". Sentences: A person is sitting, A man is sitting, A baby is sitting, A tiger is sitting, A toddler is sitting, the man is sitting indoors-- all give "is" as PhysicalAttribute instead of Entity+. Yes, "sit" is an state, not an action, but so is "standing".

arademaker commented 7 years ago

@vcvpaiva once more, so answer this puzzle we need an interactive implementation of UKB with a traces! We would need to see how the algorithm evolves during the computation to choose a sense A instead of a sense B for a given word W.

vcvpaiva commented 7 years ago

1 occ of man mapping to Man, 111 occs mapping to Hominid? 140 occs of woman mapping to Woman 9 Guitars, 11 Canines 32 playing ==> DramaticActing, 32 dancing ==> Dancing+ 24 Separating, from slice, dismantle and tear

vcvpaiva commented 7 years ago

17 correct adjectives: empty, hungry, immobile naked, motionless, silent scared, animated, bored happy, sad golden, white young, old small asleep

some of the adjectives are past participles: scared, animated, bored which causes problems.

vcvpaiva commented 7 years ago

only 50 nsubjs, kind of:

  1. man
  2. male
  3. boy
  4. woman
  5. lady
  6. girl
  7. female
  8. child
  9. kid
  10. toddler
  11. baby

  12. clown
  13. cheerleader
  14. speaker
  15. biker
  16. ringer (sumo)
  17. wrestler
  18. mime
  19. cyclist
  20. band
  21. team
  22. someone
  23. nobody/ no one
  24. people
  25. person

  26. fish
  27. kitten
  28. cat
  29. dog
  30. golden retriever
  31. lemur
  32. cow
  33. panda
  34. tiger
  35. lion
  36. horse
  37. parrot
  38. cub
  39. hamster
  40. squirrel
  41. animal

  42. classroom (place, not the people)
  43. plane
  44. airplane
  45. jet
  46. boat
  47. pencil
  48. keyboard
  49. barbells
  50. paper
  51. onion
vcvpaiva commented 7 years ago


  1. play cricket/soccer/rugby
  2. receive volleyball
  3. play piano/keyboard/guitar/flute
  4. strum guitar
  5. cut wood
  6. cut shrimps/tomatoes/garlic/potatoes/butter
  7. slice garlic/butter/potatoes/herbs/tomato/bread/tofu
  8. chop onion/garlic
  9. dice onion
  10. mix eggs
  11. beat eggs
  12. dry noodles/eggs
  13. saw logs
  14. likes parrots
  15. cook prawns/eggs
  16. boil shrimps/eggs/noodles
  17. peel banana/food
  18. store broccoli
  19. break eggs/
  20. crack eggs
  21. tear sheets
  22. hold a hedgehog
  23. cleanse animal
  24. brush a cat
  25. dirty animal
  26. feed animal
  27. spread dough
  28. lump dough
  29. eat cereal/grass
  30. amalgamate eggs
  31. lift barbells
  32. mow grass
  33. read email