yaolu / Multi-XScience

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
MIT License
42 stars 5 forks source link

implementing n-gram repeat blocking #6

Closed Tinarights closed 2 years ago

Tinarights commented 2 years ago

How did you implement tri-gram or ngram blocking in pointer-generator project? Could you please help

yaolu commented 2 years ago

You can take this as reference https://github.com/nlpyang/hiersumm/blob/476e6bf9c716326d6e4c27d5b6878d0816893659/src/abstractive/beam.py#L153

Tinarights commented 2 years ago

Is it possible to share the piece of code? It will be highly appreciated.

yaolu commented 2 years ago

If you use https://github.com/abisee/pointer-generator for summarization, you can replace beam_search.py and decode.py with the following version. It will block trigram. code.zip

Tinarights commented 2 years ago

Thanks a lot. Really appreciate that. I still couldn't reproduce the exact results on PG. In your paper, you mentioned the min_dec_steps=110, but what is the max_dec_steps? How many training steps have you done?

Thanks