kkalouli / SICK-processing

8 stars 6 forks source link

CoreNLP processing: gerunds as nouns #1

Open vcvpaiva opened 7 years ago

vcvpaiva commented 7 years ago

the example I noticed is

A(n) animal is grazing in a field .

1 A a DT Definite=Ind 2 det 2:det U 2 animal animal NN Number=Sing 4 nsubj 4:nsubj n00015388:0.5|Animal= 3 is be VBZ Person=3|Tense=Pres 4 cop 4:cop v02604760:0.3275199223196807,v02616386:0.11751591391828978,v02655135:0.04841329219949049,v02603699:0.025380662220664863,v02749904:0.02003037355938439,v02664769:0.004541381187274196,v02620587:0.010635331496913446,v02445925:0.008466210102411607,v02697725:0.005759388968895657,v02268246:0.0029239638802821665,v02614181:0.004271396544107938,v02744820:0.002491655852962043,v02702508:0.0010027953363023685|Entity+ 4 grazing grazing NN Number=Sing 0 root v01576165:0.3007793771671978,v01608508:0.11208625908186534,v01576478:0.044906918994578514,v01240514:0.02264950451063458,v01174742:0.022252968591395122 5 in in IN 7 case 7:case U 6 a a DT Definite=Ind 7 det 7:det U 7 field field NN Number=Sing 4 nmod 4:nmod:in n08569998:0.3927532492895872,n08506641:0.06094683513178776,n08569777:0.02049358106510134,n05996646:0.009457634581763852,n11456760:0.0051914468854773535,n01097119:0.0031801618667544732,n14514039:0.002101339649380581,n08570758:0.0014676209468262136,n09393605:0.001069340268047567,n08005260:8.056006105536499E-4,n08551628:6.235284140766389E-4,n07999584:4.934925461115005E-4,n07999471:3.979605804971717E-4,n08659446:3.260826012218618E-4,n08005580:2.7088595138331475E-4,n05932891:2.277431238162055E-4,n02687992:1.9349648761329358E-4|Field= 8 . Definite=Ind . _ 4 punct 7:det U

The root is misclassified as Noun.

But the disambiguation decided it is a verb, so there is no PWN synset associated to it.

Can we count the number of these cases? Ie the number of cases where we have no PWN associated to a verb/noun/adjective or adverb?


vcvpaiva commented 7 years ago

Another example of the verb misclassified.

Drunk should not be the adjective, but the verb in passive voice.

water from the faucet is being drunk by a yellow dog .

1 water water NN Number=Sing 7 nsubj 7:nsubj n14845743:0.3991208968614243,n09225146:0.06193495672579126,n14847357:0.020825840319336614,n04562658:0.00961096974573467,n14855724:0.005275615009393388,n07935504:0.0032317213383196773|Water= 2 from from IN 4 case 4:case U 3 the the DT Definite=Def 4 det 4:det U 4 faucet faucet NN Number=Sing 1 nmod 1:nmod:from n03325088:0.5|Device+ 5 is be VBZ Person=3|Tense=Pres 7 aux 7:aux v02604760:0.3114391050220755,v02616386:0.09828217313562741,v02655135:0.04656660638143316,v02603699:0.02212327003448585,v02749904:0.01690163925955698,v02664769:0.004367864877075426,v02620587:0.009091657786015173,v02445925:0.007383011444773139,v02697725:0.004938390052273016,v02268246:0.0028365678581454452,v02614181:0.0036329649846489286,v02744820:0.0021939740637752047,v02702508:9.925551304024792E-4|Entity+ 6 being be VBG 7 cop 7:cop v02604760:0.318084152910589,v02616386:0.10082733325354694,v02655135:0.04666948528844935,v02603699:0.022634124585239404,v02749904:0.01736241746174977,v02664769:0.004385479341065866,v02620587:0.00934997984448312,v02445925:0.007566644195040132,v02697725:0.005077354332381279,v02268246:0.002870758825870974,v02614181:0.003740211626838878,v02744820:0.0022390091754813744,v02702508:9.98797239421912E-4|Entity+ 7 drunk drunk JJ 0 root a00797299:0.770862751413455,a00920260:0.229137248586545|Drunk+ 8 by by IN 11 case 11:case U 9 a a DT Definite=Ind 11 det 11:det U 10 yellow yellow JJ 11 amod 11:amod a00385756:0.5226056962797642,a00265314:0.05679225925022514,a01640729:0.15298145205627164,a02101942:0.07541718530810326,a01228370:0.13748450239481252,a01177556:0.05471890471082336|Yellow= 11 dog dog NN Number=Sing 7 nmod 7:nmod:by n02084071:0.3974235736313493,n10114209:0.06167156876081363,n10023039:0.02073727521829126,n09886220:0.009570097613152126,n07676602:0.0052531796421179805,n03901548:0.0032179779444162413,n02710044:0.002126327189859401|DomesticDog= 12 . Definite=Ind . _ 7 punct 11:det U

vcvpaiva commented 6 years ago

@katerina says we have at least 120 cases of "fake copulas" reported in https://github.com/kkalouli/SICK-processing/blob/master/SICK_false_copula.txt 2% number of sentences