Open vcvpaiva opened 7 years ago
only conjunction is:
Paper and scissors both cut
scissors has lemmatization problem, no concept found. paper is wrong concept, newspaper. cut is wrong concept, process only.
want to distribute? enhanced CoreNLP does.
13 copulas: (one can check https://github.com/own-pt/rte-sick/blob/master/expanded/conllu/len.3-5.conllu)
4 copulas are wrong as far as dependencies are concerned. others are wrong as far as the mapping is concerned, later.
19 negations marked:
all dependencies seem fine.
However there are several semantic negations that are not marked.
and also:
Looking for the 20 nns is difficult, as the reps use NN for noun. However, the first 7 noun-nouns below seem correct. and to give them concepts, they need to be considered compounds.
Wrong ones:
there are 4 dependencies "dep"
"acomp"=2 ===> falling asleep. "quantmod"===> 3 "a few" "amod"= 20 ===> correct ones: old woman, young female, young kids, cold cyclist, happy baby, young girl, white horse, sad man, hungry woman, small animal
wrong ones: a few kittens, few men, several children, lemur eating, one typing, grating carrots, dicing garlic, golden retriever, animated airplane, man thinking,
particle verbs:
# text = A toddler is standing up
1 A a DET DT _ 2 det _ DT|?|?
2 toddler toddler NOUN NN _ 4 nsubj _ NN|10714465-n|NonFullyFormed+
3 is be VERB VBZ _ 4 aux _ VBZ|02604760-v|Entity+
4 standing stand VERB VBG _ 0 ROOT _ VBG|01546111-v|PhysicalAttribute+
5 up up PRT RP _ 4 prt _ RP|00097011-r|Increasing+
only one preposition: Someone is on a horse
expletives
5 passive voices only, all correct.
advmodifiers: (34 in total)
Others:
actually in the whole corpus we have 112 adverbs ending in "ly" -- this is for future use in the nomlex for adjective/adverbs. they're attached. sadverbs.txt
I do not understand why only the verb "sit" gives a different meaning to the auxiliary "to be". Sentences: A person is sitting, A man is sitting, A baby is sitting, A tiger is sitting, A toddler is sitting, the man is sitting indoors-- all give "is" as PhysicalAttribute instead of Entity+. Yes, "sit" is an state, not an action, but so is "standing".
@vcvpaiva once more, so answer this puzzle we need an interactive implementation of UKB with a traces! We would need to see how the algorithm evolves during the computation to choose a sense A instead of a sense B for a given word W.
1 occ of man mapping to Man, 111 occs mapping to Hominid? 140 occs of woman mapping to Woman 9 Guitars, 11 Canines 32 playing ==> DramaticActing, 32 dancing ==> Dancing+ 24 Separating, from slice, dismantle and tear
17 correct adjectives: empty, hungry, immobile naked, motionless, silent scared, animated, bored happy, sad golden, white young, old small asleep
some of the adjectives are past participles: scared, animated, bored which causes problems.
only 50 nsubjs, kind of:
dobjs:
According to stats (https://github.com/own-pt/rte-sick/blob/master/expanded/conllu/len.3-5.conllu.stats.xml) 390 sentences (385 nsubj+5nsubjpass), 1 conjunction only, 4 dependencies 'dep', 13 copulas, 140 direct objects, 26 expletives, 19 negations and 20 noun-nouns.
1 prep and 9 particles.
The rest is: "acomp"=2 "advmod"=34 "amod"= 20 "partmod"= 10 "pobj"= 1
also 30 ADJ, 45 ADV "quantmod"= 3