xiongma / transformer-pointer-generator

A Abstractive Summarization Implementation with Transformer and Pointer-generator
MIT License
395 stars 79 forks source link
abstractive-summarization nlp pointer-generator tensorflow transformer

A Abstractive Summarization Implementation with Transformer and Pointer-generator

when I wanted to get summary by neural network, I tried many ways to generate abstract summary, but the result was not good. when I heared 2018 byte cup, I found some information about it, and the champion's solution attracted me, but I found some websites, like github gitlab, I didn't find the official code, so I decided to implement it.

Requirements

Model Structure

Based

My model is based on Attention Is All You Need and Get To The Point: Summarization with Pointer-Generator Networks

Change

Training

name type detail
vocab_size int vocab size
train str train dataset dir
eval str eval dataset dir
test str data for calculate rouge score
vocab str vocabulary file path
batch_size int train batch size
eval_batch_size int eval batch size
lr float learning rate
warmup_steps int warmup steps by learing rate
logdir str log directory
num_epochs int the number of train epoch
evaldir str evaluation dir
d_model int hidden dimension of encoder/decoder
d_ff int hidden dimension of feedforward layer
num_blocks int number of encoder/decoder blocks
num_heads int number of attention heads
maxlen1 int maximum length of a source sequence
maxlen2 int maximum length of a target sequence
dropout_rate float dropout rate
beam_size int beam size for decode
gpu_nums int gpu amount, which can allow how many gpu to train this model, default 1

Note

Don't change the hyper-parameters of transformer util you have good solution, it will let the loss can't go down! if you have good solution, I hope you can tell me.

Evaluation

Loss

If you like it, and think it useful for you, hope you can star.