tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.76k forks source link

textsum decode giving summaries of other articles #464

Closed tumusudheer closed 8 years ago

tumusudheer commented 8 years ago

This issue is regarding textsum model.

I training the model using the given toy data set. And to see the result real quick, I tried to test with just one article from toy dataset. When I tested with one articles, The decoder is giving the summary of other article as a result (other article I've use in training) which is completely irrelevant to what I tested with.

Eg: My test articles is the sri lankan government on wednesday announced the closure of government schools with immediate effect as a military campaign against tamil separatists escalated in the north of the country . </s> <s> the cabinet wednesday decided to advance the december holidays by one month because of a threat from the liberation tigers of tamil eelam -lrb- ltte -rrb- against school children , a government official said . </s> <s>``there are intelligence reports that the tigers may try to kill a lot of children to provoke a backlash against tamils in colombo . </s> <s>``if that happens , troops will have to be withdrawn from the north to maintain law and order here , '' a police official said . </s> <s> he said education minister richard pathirana visited several government schools wednesday before the closure decision was taken . </s> <s> the government will make alternate arrangements to hold end of term examinations , officials said . </s> <s> earlier wednesday , president chandrika kumaratunga said the ltte may step up their attacks in the capital to seek revenge for the ongoing military offensive which she described as the biggest ever drive to take the tiger town of jaffna

and Summary is sri lanka closes schools as war escalates

But the decoder gave this output=outdrink germans in [UNK] stakes .

When I tried with multiple articles for testing, The decoder is giving summaries of other articles as a result. Any one had similar issue ? Any thing I'm missing or doing wrong ?

xtr33me commented 8 years ago

Hey tumusudheer! Hope all is well with ya. Last night before I went to bed I checked out what you stated in another post with regards to the vocab file not containing the words used in the data file and sure enough I too was able to verify. So I wrote a little script that literally just went through the decoded toy data file and had it generate the counts for each of the words. I then appended this to the existing vocab file they provided minus the duplicates. I ran the training today using the params provided in the readme. I just got back from work and tried a decode and the results were very good. I am not fully sure why you might be seeing in incorrect indexed result but with the above actions my decode results and ref file indexes all matched. Now when I say they matched, I mean that the decode result line indicies matched the ref line indicies. The decode headlines had some strange results, but I believe that is just due to the fact we are training against a smaller dataset. I am including the updated vocab file below. Hope this helps some. vocab.txt

tumusudheer commented 8 years ago

Hey @xtr33me ,

Thank you very much for the response and providing me the vocab file. I'll train and test the toy data set with it.

One quick question about the decoding: May I know where do we get to see the final results of decoding ? I ran decoding with decoding steps : 1000. Approximately for every 1-2 seconds, I can see a new ref<timestamp> and dec<timstamp> file being generated in the decode folder.

Do we get the final decoded results on the console ? Or do we need to check the last generated decode file for final results ? Also can you please share the parameters you used for decoding the toy data set.

I used the following params: beam_size = 2

`tf.app.flags.DEFINE_integer('max_decode_steps', 1000, 'Number of decoding steps.') tf.app.flags.DEFINE_integer('decode_batches_per_ckpt', 8, 'Number of batches to decode before restoring next ' 'checkpoint')

DECODE_LOOP_DELAY_SECS = 2 DECODE_IO_FLUSH_INTERVAL = 100`

Thank you..

tumusudheer commented 8 years ago

Hey @xtr33me ,

I just did one more round of quick training with toy data set and ran the training for 10000 steps. It converged for approximate 0.09 loss.

Then I tested decoder with just one article (from toy data which was also part of my training data) The article is : the thirsty citizens of the czech republic have dethroned their german..

My decoder files look as follows: ref<timestamp> file : output=czechs outdrink germans in beer-guzzling stakes decode<timestamp> file : output=us firm bids #.# billion pounds for seeboard .

Surprisingly it is not happening to you. This time I kept beam_size =8 and other decoding params intact except max_run_steps = 1000

Are you testing with the articles that were also part of your training data (I was doing that just as part of quick test) ? Please let me know and also please providing me the decoding parameters you are using for toy data.

xtr33me commented 8 years ago

I used the below commands which were found in readme after overwriting with the provided vocab. I too first just wanted to prove the model worked so I pointed to the dataset and vocab provided and ran the training and decoding as stated. You are correct in looking in the decode folder and opening the decode file(s) for your results. The same line number in the reference file is the actual article title.

Run the training.

bazel-bin/textsum/seq2seq_attention \ --mode=train \ --article_key=article \ --abstract_key=abstract \ --data_path=data/training-* \ --vocab_path=data/vocab \ --log_root=textsum/log_root \ --train_dir=textsum/log_root/train

Run the decode.

bazel-bin/textsum/seq2seq_attention \ --mode=decode \ --article_key=article \ --abstract_key=abstract \ --data_path=data/test-* \ --vocab_path=data/vocab \ --log_root=textsum/log_root \ --decode_dir=textsum/log_root/decode \ --beam_size=8

Reference File Sample Indexes:

output=leaders begin arriving for commonwealth summit . output=brazilian families to get reparations for military regime killings . output=opposition activists paralyse calcutta . output=scorecard in australia-pakistan cricket test . output=sri lanka closes schools as war escalates .

Decode File Sample Indexes:

output=begin arriving for commonwealth commonwealth summit . output=families to to get for military regime killings . output=activists paralyse calcutta . . output=scorecard in australia-pakistan cricket test . output=lanka closes schools as war escalates .

tumusudheer commented 8 years ago

Hi @xtr33me ,

Seems like it is still not working for me when I trained with toy data set (removed one article from it which I used for test/decode). Again the decode output is completely different from ref file output.

May I know which version of tensorflow you are using ? I'm using 0.10 and the textstum model I downloaded from master branch (https://github.com/tensorflow/models/tree/master/textsum). Are you using the same tensorflow version and same branch ? Let me know if are using different versions/branch ?

Thank you...

puppet101 commented 8 years ago

Hi, @xtr33me,

Could you please explain what the number means after each word in the vocab.txt ? Because after reading the code, it seems that the number value is not useful, the word index is just equal to the line number in the vocab file. Thank you~

xtr33me commented 8 years ago

Hey @puppet101 That is a count of how many times that word shows up in the overall training set. So In the case of the original vocab file out there, that represents the count for how many times that word showed up in the Gigaword corpus data.

Now I wrote a small little script that just just that for the toy training set they provided. I originally just used that for the vocab file without any of the original entries, however I guess there weren't enough and it was erroring out stating so. So then I just added the generated entries to the end of the vocab file originally included (which would fill in the vocab word issue) and then I did have to go about removing duplicates.

Now as of now, I'm not fully sure on how the logic is using that count. I assume their might be a statistical tie to that count and the probability of more common words being chosen over those that are not. However I cannot say for certain that is the case as of now. As my first step here was to just get something working so I could then start breaking everything down to its smaller pieces to better understand. Hoping to have some more time this weekend to look some more. Hope that helps.

xtr33me commented 8 years ago

Hey @tumusudheer

So that in itself is interesting that you even got this working with 0.10. I opened a ticket a few weeks back around getting shape related issues. Can be seen here: https://github.com/tensorflow/models/issues/417

A few others have also seen the same thing, so it oddly enough might be the source to your problem but the fact that you were even able to run it without even getting any error is further than I was able to get. So currently I am running Ubuntu 16.04. I was also running all the latest versions of babel, TF 0.10, Cuda 8 and CuDNN 5.1.5. After running into so many issues with getting the textsum model just running, I decided to try with TF 0.9, Cuda 7.5 and CuDNN 4. Once downgrading, all seemed to work without issue. About a week later a few others also were seeing the issue. By downgrading, they were able to continue as well.

tatatodd commented 8 years ago

@tumusudheer is this still an issue for you? Have you tried with tensorflow version 0.11?

aselle commented 8 years ago

Closing automatically due to lack of recent activity. Please reopen when further information becomes available. Thank you.

xiaoyugan0418 commented 7 years ago

Hi, @puppet101 I have the same issue with you. Even I use a big dataset I created, most of the decode results are not correct, they got words even not in the original article. Did you fix this problem? Thanks!

Kitter commented 6 years ago

Hi, @xtr33me Do you use trained data for tests? I trained a model. If I use trained data to test, the results are similar to yours. But I don't understand a thing, the decode output less one word(the first word) than reference output. Why is this happend,do you konw? Thanks.

mainakchain commented 6 years ago

Hi @xtr33me @aselle , I am getting the similar problems as @xiaoyugan0418. Most of the summaries generated is really out of the context. I am training on my personally prepared dataset. Above all, even if I feed just one article in testing article, I happen to get repeated outputs of the same result. The parameters in decode file are the default ones. Can anyone please suggest what can I do to improve the results?