Open astariul opened 5 years ago
Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check . Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie labels_tgt = input_ids_tgt[1:] input_ids_tgt = input_ids_tgt[:-1] input_mask_src = [1] len(input_ids_src) input_mask_tgt = [1] len(input_ids_tgt) while creating tf record .
I'm running your code on the CNN/Dailymail dataset.
However, training never end, displaying :
Batch #X
with X growing more and more. I waited a long time, then kill the process.
But now, when I run the inference code, produced summary is very bad. Example :
the two - year - year - year - old cate - old cat was found in the animal .
What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)
I run the inference code ,but i don't know how to produce the summary.
should i post the original story through the postman,so it will give back a summary???
Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check .
Where exactly can I set that?
In config.py you would have
lr = {
'learning_rate_schedule': 'constant.linear_warmup.rsqrt_decay.rsqrt_depth',
'lr_constant': 2 * (hidden_dim ** -0.5),
'static_lr': 1e-3,
'warmup_steps': 10000,
} .
You could increase to around 15000-20000.
When I put low numbers for steps =10 , warm up steps = 10 , max eval=10 iteration is still going 150+ for epoch 0. Could you help clarifying how those numbers are interlinked.
Set your wamup steps to 10 percent of the total number of iterations required . In my case 15,000 helped. But please check . Also please make sure you are sending the delimiter ie [SEP] as an indicator to stop decoding ie labels_tgt = input_ids_tgt[1:] input_ids_tgt = input_ids_tgt[:-1] input_mask_src = [1] len(input_ids_src) input_mask_tgt = [1] len(input_ids_tgt) while creating tf record .
hello, I adopt the default setting and obtain ROUGE-1/2/L: 39.29/17.30/27.10. In fact the ROUGE-L result is terrible. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32. Could you provide your results on CNN/Dailymail dataset, or do you know what is wrong? Many thanks!@Vibha111094
I am following the default settings. But after the second epoch, it's taking too long. Does anyone else happen to face the same problem?
I'm running your code on the CNN/Dailymail dataset.
However, training never end, displaying :
with X growing more and more. I waited a long time, then kill the process.
But now, when I run the inference code, produced summary is very bad. Example :
What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)