nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders

MIT License

1.29k stars 465 forks source link

raw text -mode test_text -task ext --> min/max lenght not working #164

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello.

When doing extractive summarization of raw text using bertext_cnndm_transformer and trying to twist min/max lenght, the output is always the same.

eg. python train.py -mode test_text -task ext -test_from /home/peter/projects/summary/PreSumm/models/bertext_cnndm_transformer.pt -text_src /home/peter/projects/summary/PreSumm/raw_data/raw_data.txt -min_length 200 -max_length 1000

and

python train.py -mode test_text -task ext -test_from /home/peter/projects/summary/PreSumm/models/bertext_cnndm_transformer.pt -text_src /home/peter/projects/summary/PreSumm/raw_data/raw_data.txt -min_length 1000 -max_length 2000

produces the very same output.

Has anyone run into the same problem?

AyeshaSarwar commented 4 years ago

Hi, I am also trying to test for raw_text but I am unable to find the output file. Can you tell me where I can find this. I can only see .candidate and .gold file in results folder. The candidate file contains the same text from the raw text file.

After running this command, it only says " Validation xent: 0 at step -1"

ghost commented 4 years ago

There should be the final output in .candidate file. If you got this type of problem, see this comment https://github.com/nlpyang/PreSumm/issues/130#issuecomment-600965008

Andrei997 commented 4 years ago

If this is still a problem, I have found a hard-coded limit here of 3 sentences within one of the files, and tried changing that value, which resulted in the model producing longer extractive summaries.

You can change that into whatever length summaries you desire. It seems you have access to args but I haven't tried tinkering with that.

if ((not cal_oracle) and (not self.args.recall_eval) and len(_pred) == 3):

    break