when I wanted to get summary by neural network, I tried many ways to generate abstract summary, but the result was not good. when I heared 2018 byte cup, I found some information about it, and the champion's solution attracted me, but I found some websites, like github gitlab, I didn't find the official code, so I decided to implement it.
My model is based on Attention Is All You Need and Get To The Point: Summarization with Pointer-Generator Networks
python train.py
Check hparams.py
to see which parameters are possible. For example,
python train.py --logdir myLog --batch_size 32 --train myTrain --eval myEval
My code also improve multi gpu to train this model, if you have more than one gpu, just run like this
python train.py --logdir myLog --batch_size 32 --train myTrain --eval myEval --gpu_nums=myGPUNums
name | type | detail |
---|---|---|
vocab_size | int | vocab size |
train | str | train dataset dir |
eval | str | eval dataset dir |
test | str | data for calculate rouge score |
vocab | str | vocabulary file path |
batch_size | int | train batch size |
eval_batch_size | int | eval batch size |
lr | float | learning rate |
warmup_steps | int | warmup steps by learing rate |
logdir | str | log directory |
num_epochs | int | the number of train epoch |
evaldir | str | evaluation dir |
d_model | int | hidden dimension of encoder/decoder |
d_ff | int | hidden dimension of feedforward layer |
num_blocks | int | number of encoder/decoder blocks |
num_heads | int | number of attention heads |
maxlen1 | int | maximum length of a source sequence |
maxlen2 | int | maximum length of a target sequence |
dropout_rate | float | dropout rate |
beam_size | int | beam size for decode |
gpu_nums | int | gpu amount, which can allow how many gpu to train this model, default 1 |
Don't change the hyper-parameters of transformer util you have good solution, it will let the loss can't go down! if you have good solution, I hope you can tell me.
As you see, transformer-pointer generator model can let the loss go down very quickly!