Beam search is broken - Githubissues

ramsey-coding commented 2 years ago

Hi there!

I came across this library recently and looks exciting. I tried out a couple of examples and it works great. There are quite couple of libraries on GNN. But it seems to a comprehensive one thats focuses on NLP.

I also look a deeper with the NMT example and tried out translate with different beam size. But looks like beam search is completely buggy.

Beam search works for the top-1. But looks like when top_k > 1 does not work with beam_width > 1. Lets say the top-1 inference is “ABC”. Then top-5 candidate becomes:

Beam_width = 5, top_k = 5
“ABC”
“ABC” “XYX”
“ABC” “XYX” “BCD”
“ABC” “XYX” “BCD” “SUE”
.. so on and so forth

Beam-seach decoding should not work like this. It should keeps track of k states rather than just one. It should sort all candidate sequences in ascending order by their score and select the first k as the most likely candidate sequences. Here looks like it is just expanding on one candidate.

AlanSwift commented 2 years ago

Thanks for your feedback. I think this is a bug and I will check it now.

ramsey-coding commented 2 years ago

@AlanSwift @hugochan @SaizhuoWang this is a serious limitation of the library. Would this be resolved for the library? Or is it a fundamental flaw that beam-search would never work for the library.

ramsey-coding commented 2 years ago

@AlanSwift @hugochan @teddylfwu @SaizhuoWang my feedback about the library:

I think the effort for the library is misdirected. For example, the library has severe bug like Beam search is broken. I think any Major bug should be prioritized as that completely limits the use of the library.
Even if I look into the current Roadmap, I see effort is seriously misguided. The only major feature I see is adding relational GNN support which is great. But what's the point of so many bookkeeping tasks like add a brand new configuration system based on OmegaConf. These are just busy work that does not do much for an average user of the library. I am pretty sure a lot of effort would then be directed just to modify/update existing examples and to use this new configuration system.

Unfortunately that does not do much for a typical user. No one really cares that you are using OmegaConf configuration library. What matters is adding actual useful feature into the library. And it is extremely important to ensure the features that are already into the library actually works and there is no bug. Lets say you added R-GCN and it is buggy. Then the priority should be to fix that bug instead of updating a configuration system.

If there is a major bug, there is no point of adding more and more features into the library. To summarize, for any software engineering tasks, there needs to be a balance between what is "actual feature" vs "doing chores" (like updating configuration system) which does not add much to the library.

AlanSwift commented 2 years ago

Sorry for the late reply. I will handle this issue in this week. @ramsey-coding

AlanSwift commented 2 years ago

I can't reproduce your bug. This is the beam search results for NMT when K=3, beam=5:

Case 1:
ground truth:  and of course , we all share the same adap@@ tive im@@ per@@ a@@ tives .
top: 1's beam search: and of course , we all share the same adap@@ tation .
top: 2's beam search: and of course we all share the same adap@@ tation .
top: 3's beam search: and of course , we all share the same adap@@ t@@ ability .

Case 2:
ground truth:  we &apos;re all born . we all bring our children into the world .
top: 1's beam search: we &apos;re all born . we bring children to the world .
top: 2's beam search: we &apos;re all born . we bring kids to the world .
top: 3's beam search: we &apos;re all born . we get children to the world .

Case 3:
ground truth:  we have to deal with the in@@ ex@@ or@@ able separ@@ ation of death , so it shouldn &apos;t surprise us that we all sing , we all dance , we all have art .
top: 1's beam search: we have to put ourselves with the un@@ stable separ@@ ation through death , and so we shouldn &apos;t surprise us that we all sing , dance and art .
top: 2's beam search: we have to put ourselves with the un@@ stable separ@@ ation by death , and so we shouldn &apos;t surprise us that we all sing , dance and art .
top: 3's beam search: we have to put ourselves with the un@@ stable separ@@ ation of death , and so we shouldn &apos;t surprise us that we all sing , dance and art .

Case 4:
ground truth:  all of these peop@@ les teach us that there are other ways of being , other ways of thinking , other ways of ori@@ enting yourself in the earth .
top: 1's beam search: all of these people teach us that there are still other existence , other ways , other ways of con@@ gress on earth .
top: 2's beam search: all of these people teach us that there are still other existence , other ways , other ways of re@@ duc@@ ing on earth .
top: 3's beam search: all of these people teach us that there &apos;s still other existence , other ways , other ways of re@@ duc@@ ing on earth .

Case 5:
ground truth:  to this day , they re@@ main rul@@ ed by a ritual pri@@ es@@ tho@@ od but the training for the pri@@ es@@ tho@@ od is rather extraordinary .
top: 1's beam search: until the next day , you &apos;ll be treated by a ritual p@@ rie@@ stry . the education of p@@ rie@@ sts are very extraordinary .
top: 2's beam search: until the next day , you &apos;ll be treated by a ritual p@@ rie@@ stry . the education of p@@ rie@@ sts is very extraordinary .
top: 3's beam search: until the next day , you &apos;re going to be treated by a ritual p@@ rie@@ stry , the education of p@@ rie@@ sts is very extraordinary .

Case 6
ground truth:  his un@@ cle fl@@ ed with his hol@@ iness in the di@@ as@@ por@@ a that took the people to ne@@ p@@ al .
top: 1's beam search: his un@@ cle fle@@ w with their hol@@ iness in the di@@ as@@ por@@ a , people went to ne@@ p@@ al .
top: 2's beam search: his un@@ cle fle@@ w with their hol@@ iness in the di@@ as@@ por@@ a , and people went to ne@@ p@@ al .
top: 3's beam search: his un@@ cle fle@@ w with their hol@@ iness in the di@@ as@@ por@@ a , people to go to ne@@ p@@ al .

Case7
ground truth:  the can@@ adi@@ an government has not always been kind to the in@@ u@@ it people , and during the 195@@ 0s , to establi@@ sh our so@@ ver@@ eig@@ n@@ ty , we forced them into sett@@ le@@ ments .
top: 1's beam search: the can@@ adi@@ an government is not always very good with the in@@ u@@ it , and during the 50@@ th of our wealth , they were forced to pull in sett@@ le@@ ments .
top: 2's beam search: the can@@ adi@@ an government is not always very good with the in@@ u@@ it , and during the 50@@ th years of our wealth , they were forced to pull in sett@@ le@@ ments .
top: 3's beam search: the can@@ adi@@ an government is not always very good with the in@@ u@@ it , and during the 50@@ th of our wealth , they were forced to pull up in sett@@ le@@ ments .

The beam search's results for topk>1 seem ok. And actually, the beam search can obtain about 1.0 gain for BLEU4 metric.

ramsey-coding commented 2 years ago

The prefix is always the same in all the cases. Why is that?

Let's take the ground truth: to this day , they re@@ main rul@@ ed by a ritual pri@@ es@@ tho@@ od but the training for the pri@@ es@@ tho@@ od is rather extraordinary .

The predictions all starting with until the next day. This output suggest during prediction it is not keeping track of k states and rather than just one is used. It should not just expand on one candidate and should sort all candidate sequences in ascending order by their score and select the first k as the most likely candidate sequences.

top: 1's beam search: until the next day , you &apos;ll be treated by a ritual p@@ rie@@ stry . the education of p@@ rie@@ sts are very extraordinary .
top: 2's beam search: until the next day , you &apos;ll be treated by a ritual p@@ rie@@ stry . the education of p@@ rie@@ sts is very extraordinary .
top: 3's beam search: until the next day , you &apos;re going to be treated by a ritual p@@ rie@@ stry , the education of p@@ rie@@ sts is very extraordinary .

The predictions look suspicious, isn't it?

AlanSwift commented 2 years ago

There is no guarantee about the results' prefix. This is an example: 1: A B C D 2: A B E F 3: C D E F If the final score 1 > 2 >3，the 3 is dropped and 1 and 2 are reserved with the same prefix A B. The prefix is not guaranteed to be the same, This is just a natural result. I suggest you dig into the beam search algorithm.

graph4ai / graph4nlp

Beam search is broken #505