own-pt / rte-sick

RTE Experiment
1 stars 3 forks source link

SyntaxNet produces roots for conjunctions? #1

Closed vcvpaiva closed 7 years ago

vcvpaiva commented 7 years ago

Do we know that "A group of kids is playing in a yard and an old man is standing in the background" is [C and D]

where C= A group of kids is playing in a yard and D= an old man is standing in the background ?

similarly do we know that nsubj of playing = a group of kids?

arademaker commented 7 years ago

I didn't understand your point, what is the issue ? From the dependencies I believe we indeed could be able to derive, as I said in my email, atomic propositions as:

(subj a_group playing)
(nmod of_kids a_group)
(nmod in_a_yard playing)
(and playing standing)
(subj an_old_man standing)
(nmod in_the_background standing)

And @apease said the rules in SemRewrite.txt can be used to produce a complete logical form.

arademaker commented 7 years ago
  1. A:[A group of kids is playing in a yard] and B:[an old man is standing in the background] .
  2. A':[A group of boys in a yard is playing] and B':[a man is standing in the background.]

Since all boys are kids so (implies A' A) but not all kids are boys so (not (implies A A')). All old men are also men so (implies B B') but not all men are old men so (not (implies B' B)).

So we have A' -> A and B -> B' and it looks like we can't conclude from A & B |= A' & B' nor A' & B' |= A & B.

apease commented 7 years ago

Hi Alexandre,

I mentioned converting dependency parses to logic earlier but didn't explain well. What I meant was that from

A group of kids is playing in a yard

we get

det(group-2, A-1) nsubj(playing-6, group-2) case(kids-4, of-3) nmod:of(group-2, kids-4) aux(playing-6, is-5) root(ROOT-0, playing-6) case(yard-9, in-7) det(yard-9, a-8) nmod:in(playing-6, yard-9)

and semantic rewriting aims to turn this into

(exists (?P ?G ?Y ?K) (and (instance ?G GroupOfPeople) (instance ?Y CultivatedLandArea) (instance ?P RecreationOrExercise) (agent ?P ?G) (memberType ?G HumanYouth) (located ?P ?Y)))

all the best, Adam

arademaker commented 7 years ago

@apease I know that A. Pease and J. Li, “Controlled English to Logic Translation,” does have some preliminar explanation about the code that you have. You said also that

What is needed are interpretation templates or schema that combine an adjective or adverb with other elements of linguistic content to create some interpretation. I have a very basic start on this in some of the rules in the Semantic Rewriting content SemRewrite.txt which is based on Dick's approach. Much more is needed, but this is all to say that any concern about coverage should be a concern about what little I've done so far with Semantic Rewriting, rather than a critique of SUMO.

Can you provide more details about your code and planned improvements? Maybe we can start from your code and work on it. The @vcvpaiva's idea can be a good opportunity to have a first evaluation of the approach. I belive @gdemelo agrees with that too.

arademaker commented 7 years ago

BTW, @vcvpaiva note that token standing is conj with token playing. That is why I said above that one can derive (and playing standing) from the dependencies. But @apease suggested a better final representation. Basically I was thinking in triples (RDF) and Adam in FOL.

apease commented 7 years ago

sending a writeup of SemRewrite to you all via email, since it's unpublished but should try to expand and publish it somewhere....

vcvpaiva commented 7 years ago

@arademaker and @apease what I mean is that from A group of kids is playing in a yard

det(group-2, A-1) nsubj(playing-6, group-2) case(kids-4, of-3) nmod:of(group-2, kids-4) aux(playing-6, is-5) root(ROOT-0, playing-6) case(yard-9, in-7) det(yard-9, a-8) nmod:in(playing-6, yard-9)

using rewrite rules likes the ones in PROP (https://github.com/gabrielStanovsky/props) we should rewrite to remove the clauses (corresponding to stopwords) det(group-2, A-1) aux(playing-6, is-5) det(yard-9, a-8)

and obtain a rewritten dependency (Sebastian Schuster calls it "enhanced dependencies" see Sebastian Schuster and Christopher D. Manning. 2016. Enhanced English Universal Dependencies: An Improved Representation for Natural Language Understanding Tasks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).)

of the form nsubj(playing-6, group-of-kids2) root(ROOT-0, playing-6) nmod:in-location(playing-6, yard-9)

so indeed I am proposing something between triples and FOL, a KIML or AKR-like representation.

vcvpaiva commented 7 years ago

@arademaker since you've changed the name of the issue, does SyntaxNet say that the conjunction of standing and playing is the ROOT? can we count how many sentences do/don't have a verb as ROOT? can we use (easily) Zeman's stats counting program in the results we have?

vcvpaiva commented 7 years ago

@fcbr can I ask why the process stopped at 180 pairs? did your pipeline hit some limitation?

arademaker commented 7 years ago

@vcvpaiva we choose a sample in this first step only to not invest to much time before better understanding the goals

vcvpaiva commented 7 years ago

@arademaker thanks for the explanation! can you comment on whether the Zeman stats can be easily computed? I mostly want to know if the majority of the sentences are a single PROP or if the conjunctions of PROPs are big in the data too.

fcbr commented 7 years ago

@vcvpaiva we'll need to re-generate the files on a separate format, but this should be doable. I'll look into it.

fcbr commented 7 years ago

ok, added conll files + the statistics (stats.xml and root-stats.txt)

fcbr commented 7 years ago

Also, we now have all the sentence pairs.

vcvpaiva commented 7 years ago

super thanks for both!

vcvpaiva commented 7 years ago

we now have a working pipeline that produces SUMO concepts for sentences parsed by SyntaxNet and FreeLing. All 9680 pairs of sentences are parsed and mapped and stored in https://github.com/own-pt/rte-sick/tree/master/pairs. and the Turku interface now works for this corpus, so searching for issues is much easier.

@fcbr produced stats for the whole collection and for the roots. Stats for the roots show that

18038 VERB

the majority of the 19.6K roots, are verbs, but 68 are mistakes see #8. Also

1082 NOUN 492 ADJ

might have many mistakes. I wonder if we can check that these 1574 roots are copula constructions. They have to be to be correct. If they are copula, it's not guaranteed that they are correct, example is the sentence A boy in a red shirt is in front of a long blue wall . where "front" is the root, when it should have been the "wall".

Clearly for the inference task we're more concerned about the pairs, but for the conceptual mapping the individual sentences are more important, irrespective of which other sentence they are paired with.

vcvpaiva commented 7 years ago

On the other hand @arademaker was complaining about the duplication of sentences. He points out that if our main goal is to investigate accuracy and coverage of the mapping to SUMO, maybe it would make more sense to collect the sentences and de-duplicate them.

Maybe we should either de-duplicate or choose a smaller set of the corpus, as navigating to some of the sentences is hard. GitHub only lists up to a thousand files and the sentence A machine is sharpening a pencil is I believe the file 5.txt, which is not displayed.

arademaker commented 7 years ago

Stats are all broken since we have many repetitions. See https://github.com/own-pt/rte-sick/blob/master/numbers.org

vcvpaiva commented 7 years ago

don't understand what you mean by "stats are all broken", they are not broken, they're just counting occurrences(tokens), instead of types.

vcvpaiva commented 7 years ago

but the numbers in https://github.com/own-pt/rte-sick/blob/master/numbers.org are pretty good, too!!!

Can you also produce the verbs with frequency? it looks like "play" occurs a lot, is it mostly with musical instruments, but there is one playing soccer. in the total corpus we also have also kids playing in the garden...

arademaker commented 7 years ago

@vcvpaiva suggested above the http://u.cs.biu.ac.il/~stanovg/props.html , but it messed up with "A group of kids is playing in a yard and an old man is standing in the background.". Now we have to figure out if the problem is the preprocessing (dependencies) or the rules.

The enhanced dependencies may help if we have a parser to produce them. But I have a tendency to believe more on https://github.com/UniversalDependencies/docs/issues/344#issuecomment-266090018.

@vcvpaiva what is KIML?

The cases of play:

$ gawk '$4 ~ /VERB/ && $3 ~ /play/ {print $3,gensub(/.*\|/,"","g",$10)}' sentences.conllu | sort | uniq -c  | sort -nr
 763 play DramaticActing+
  78 play Game+
   6 play IntentionalProcess+
   2 play Contest+
   1 play Pretending+
   1 play MakingInstrumentalMusic+
   1 play InstrumentalMusic+

EDITED: all verbs (according SyntaxNet) , see comment below with the attached file.

vcvpaiva commented 7 years ago

@arademaker many thanks for

 763 play DramaticActing+
  78 play Game+
   6 play IntentionalProcess+
   2 play Contest+
   1 play Pretending+
   1 play MakingInstrumentalMusic+
   1 play InstrumentalMusic+

(awk is great!) notice that most of these mappings are wrong!

looking at the spreadsheet of concepts I've sent you (from the numbers you've produced for most frequent sentences), most of the meanings of `play' should be MakingMusic (by what I consider a mistake of Karen Nomorosa, playing a musical instrument is mapping to MakingInstrumentalMusic+, instead of simply MakingMusic). more than 450 sentences have "playing guitar/flute/piano/drums" in the first 30 most freq sentences, so presumably FreeLing is making these "play" go wrongly to DramaticActing+, when they should be going to MakingMusic+ (or if Karen was right MakingInstrumentalMusic+).

vcvpaiva commented 7 years ago

@arademaker I presume I cannot edit your previous comment to have the list of verbs in an external file? it will make the issue rather hard to read.

KIML is the language for meaning representations I have been proposing for NL semantics for a while. You can read about it in side 19 of http://vcvpaiva.github.io/includes/talks/stanford-leanlogic2015.pdf, but it's just another name for AKR, really.

arademaker commented 7 years ago

@vcvpaiva you can copy and paste?? Anyway, attached here the file with all verbs (according SyntaxNet) with the POS according Freeling, synset (according freeling WSD) and SUMO Concepts via mappings.

verbs-in-sick.txt

vcvpaiva commented 7 years ago

well, you did what I wanted, ie have the list as an external file. and yes, I could copy and paste and even do some calculation, see #10.

apease commented 7 years ago

Hi Alexandre, Nice list! Maybe someone (Livy?) could review for errors and I'll fix any that are found in the SUMO-WordNet mappings?

all the best, Adam

vcvpaiva commented 7 years ago

@arademaker if we look at verbs with more than 15 occurrences, it's only a hundred, which I already did check one third of, in my spreadsheet, sent via email and attached here.

there are small differences of numbers between your first list and the new one, which has synsets. why?

100-most-freq-SICK-sentences.xlsx

arademaker commented 7 years ago

@vcvpaiva

there are small differences of numbers between your first list and the new one, which has synsets. why?

Can you give me examples? I don't see reasons for the differences. I only changed the format, not the awk filters.

arademaker commented 7 years ago

@apease I will talk with @livyreal about it and make another issue for that. But I believe that most of the errors are in the WSD not in the mapping. Anyway, we may have opportunities for improving the mappings, for example, to more specific concepts.

vcvpaiva commented 7 years ago

Can you give me examples? I don't see reasons for the differences. I only changed the format, not the awk filters.

sure, the first list had as its top: all verbs (according SyntaxNet):

1410 be Entity+ 763 play DramaticActing+ 381 stand PhysicalAttribute+ 313 sit SittingDown+ 310 jump Ambulating= 305 run Attribute+ 292 walk Walking= 200 wear CoveringFn= 200 ride Carrying= 184 look SubjectiveAssessmentAttribute+ 171 slice Separating+ 120 ride Transportation+ 120 cut Decreasing+ 107 red Red= 101 put Putting= 97 sing MakingVocalMusic=

the second one has:

1249 be VBZ 02604760-v Entity+ 704 play VBG 01719302-v DramaticActing+ 358 stand VBG 01546111-v PhysicalAttribute+ 304 run VBG 01525666-v Attribute+ 294 sit VBG 01543123-v SittingDown+ 288 jump VBG 01963942-v Ambulating= 275 walk VBG 01904930-v Walking= 190 wear VBG 00047745-v CoveringFn= 176 ride VBG 01955984-v Carrying= 154 look VBG 02133435-v SubjectiveAssessmentAttribute+ 134 slice VBG 01254477-v Separating+ 112 be VBP 02604760-v Entity+ 107 red JJ 00381097-a Red= 101 cut VBG 00429060-v Decreasing+ 96 ride VBG 02102398-v Transportation+ 93 sing VBG 01731031-v MakingVocalMusic=

so, for instance we had 763 plays DramaticActing where we now have 704.

vcvpaiva commented 7 years ago

I believe that most of the errors are in the WSD not in the mapping.

not what my sample of 30 verbs shows, I'm afraid, if counting types and not tokens.

arademaker commented 7 years ago

@vcvpaiva ok, I found the reason. The problem is that I grouped the cases in a different way. In the first list, I used the code below. The couting was done from the unique combinations of lemmas and SUMO concepts only:

$ gawk '$4 ~ /VERB/ && $3 ~ /play/ {print $3,gensub(/.*\|/,"","g",$10)}' sentences.conllu | sort | uniq -c  | sort -nr
 763 play DramaticActing+
  78 play Game+
   6 play IntentionalProcess+
   2 play Contest+
   1 play Pretending+
   1 play MakingInstrumentalMusic+
   1 play InstrumentalMusic+

In the second list, I grouped the combinations of lemmas and the whole MISC field (freeling POS with not only the POS tag but the features), synset and SUMO concept:

$ gawk '$4 ~ /VERB/ && $3 ~ /play/ {print $3,$10}' sentences.conllu | sort | uniq -c  | sort -nr
 704 play VBG|01719302-v|DramaticActing+
  55 play VBN|01719302-v|DramaticActing+
  33 play VBG|01079480-v|Game+
  21 play VBZ|01079480-v|Game+
  13 play VBG|01072949-v|Game+
   7 play VBP|01079480-v|Game+
   5 play VBG|01717169-v|IntentionalProcess+
   2 play VBZ|01719302-v|DramaticActing+
   2 play VBG|01155687-v|Contest+
   2 play VBD|01079480-v|Game+
   1 play VB|01719302-v|DramaticActing+
   1 play VBZ|01724459-v|InstrumentalMusic+
   1 play VBZ|01719921-v|Pretending+
   1 play VBP|01719302-v|DramaticActing+
   1 play VBN|01079480-v|Game+
   1 play VBN|01072949-v|Game+
   1 play VBG|02370650-v|IntentionalProcess+
   1 play VBG|01726172-v|MakingInstrumentalMusic+
arademaker commented 7 years ago

@vcvpaiva about the WSD vs mapping errors. Sorry, I didn't understand your spreadsheet. Can you give me examples or help me to read your spreadsheet?

vcvpaiva commented 7 years ago

Can you give me examples or help me to read your spreadsheet?

sure. did you see the counting of "play"s in #10?

repeating here, we have 455 occurrences of play an instrument in the 31 sentences I analyzed in the spreadsheet (counting numbers from column 2, produced by you). but playing an instrument, which at the moment is mapping to MakingInstrumentalMusic in SUMO (bad mapping, as it is too specific, the concept MakingMusic would be much better) only has one occurrence in the whole corpus. so we have both a WSD problem and a mapping problem for "play".

now the spreasheet is only about examples, so it might be more informative to look at the top verbs in your list:

1249 be VBZ 02604760-v Entity+ WRONG processing 704 play VBG 01719302-v DramaticActing+ WSD + bad mapping 358 stand VBG 01546111-v PhysicalAttribute+ bad mapping 304 run VBG 01525666-v Attribute+ WSD 294 sit VBG 01543123-v SittingDown+ OK 288 jump VBG 01963942-v Ambulating= bad mapping 275 walk VBG 01904930-v Walking= OK 190 wear VBG 00047745-v CoveringFn= acceptable 176 ride VBG 01955984-v Carrying= ?? 154 look VBG 02133435-v SubjectiveAssAttribute+ bad mapping 134 slice VBG 01254477-v Separating+ acceptable 112 be VBP 02604760-v Entity+ wrong 107 red JJ 00381097-a Red= wrong 101 cut VBG 00429060-v Decreasing+ WSD 96 ride VBG 02102398-v Transportation+ bad mapping 93 sing VBG 01731031-v MakingVocalMusic= OK

vcvpaiva commented 7 years ago

@arademaker thanks for the explanation of the differences. they are not major differences. it's all approximate and if we could get correct mappings for the first 150 verbs in either of your lists, we'd be covering all verbal lemmas with more than 10 occurrences.

one of the things that @livyreal had suggested was to work only on verbs or nouns, perhaps to begin with.

have guys produced the text file which is the collection of all uniqued sentences ordered by frequency?

arademaker commented 7 years ago

not ordered but as I mentioned in the report:

https://github.com/own-pt/rte-sick/blob/master/numbers.sentences

vcvpaiva commented 7 years ago

thanks for https://github.com/own-pt/rte-sick/blob/master/numbers.sentences but it's still too big and doesn't allow for taking the top, I'm afraid. so I'm going back to the idea of the pairs myself.

vcvpaiva commented 7 years ago

I am closing this issue, as the numbers are from the previous processing of original+normalized sentences. But will open a new one with the "bad mappings" from SUMO, as I see them.